2021-06-10 15:16:03

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 0/3] A few uclamp fixes

Hi all,

This series groups together the v2 of a few patches I orignally sent
independently from each others:

https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]
https://lore.kernel.org/r/[email protected]

But since they're all touching uclamp-related things, I figured I might
as well group them in a series.

The first one is a pure fix, and the two others change a bit the
sched_setattr behaviour for uclamp to make it more convenient to use,
and allow to put restrictions on the per-task API.

Changes since v1:
- fixed the CAP_SYS_NICE check to handle the uclamp_{min,max} = -1
cases correctly;
- fixed commit message of UCLAMP_FLAG_IDLE patch.

Thanks,
Quentin

Quentin Perret (3):
sched: Fix UCLAMP_FLAG_IDLE setting
sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
sched: Make uclamp changes depend on CAP_SYS_NICE

kernel/sched/core.c | 64 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 53 insertions(+), 11 deletions(-)

--
2.32.0.272.g935e593368-goog


2021-06-10 15:17:16

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v2 2/3] sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS

SCHED_FLAG_KEEP_PARAMS can be passed to sched_setattr to specify that
the call must not touch scheduling parameters (nice or priority). This
is particularly handy for uclamp when used in conjunction with
SCHED_FLAG_KEEP_POLICY as that allows to issue a syscall that only
impacts uclamp values.

However, sched_setattr always checks whether the priorities and nice
values passed in sched_attr are valid first, even if those never get
used down the line. This is useless at best since userspace can
trivially bypass this check to set the uclamp values by specifying low
priorities. However, it is cumbersome to do so as there is no single
expression of this that skips both RT and CFS checks at once. As such,
userspace needs to query the task policy first with e.g. sched_getattr
and then set sched_attr.sched_priority accordingly. This is racy and
slower than a single call.

As the priority and nice checks are useless when SCHED_FLAG_KEEP_PARAMS
is specified, simply inherit them in this case to match the policy
inheritance of SCHED_FLAG_KEEP_POLICY.

Reported-by: Wei Wang <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
kernel/sched/core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b213402798e..1d4aedbbcf96 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6585,6 +6585,10 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
rcu_read_unlock();

if (likely(p)) {
+ if (attr.sched_flags & SCHED_FLAG_KEEP_PARAMS) {
+ attr.sched_priority = p->rt_priority;
+ attr.sched_nice = task_nice(p);
+ }
retval = sched_setattr(p, &attr);
put_task_struct(p);
}
--
2.32.0.272.g935e593368-goog

2021-06-11 09:03:56

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS

On Thursday 10 Jun 2021 at 21:15:45 (+0200), Peter Zijlstra wrote:
> On Thu, Jun 10, 2021 at 03:13:05PM +0000, Quentin Perret wrote:
> > SCHED_FLAG_KEEP_PARAMS can be passed to sched_setattr to specify that
> > the call must not touch scheduling parameters (nice or priority). This
> > is particularly handy for uclamp when used in conjunction with
> > SCHED_FLAG_KEEP_POLICY as that allows to issue a syscall that only
> > impacts uclamp values.
> >
> > However, sched_setattr always checks whether the priorities and nice
> > values passed in sched_attr are valid first, even if those never get
> > used down the line. This is useless at best since userspace can
> > trivially bypass this check to set the uclamp values by specifying low
> > priorities. However, it is cumbersome to do so as there is no single
> > expression of this that skips both RT and CFS checks at once. As such,
> > userspace needs to query the task policy first with e.g. sched_getattr
> > and then set sched_attr.sched_priority accordingly. This is racy and
> > slower than a single call.
> >
> > As the priority and nice checks are useless when SCHED_FLAG_KEEP_PARAMS
> > is specified, simply inherit them in this case to match the policy
> > inheritance of SCHED_FLAG_KEEP_POLICY.
> >
> > Reported-by: Wei Wang <[email protected]>
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > kernel/sched/core.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 3b213402798e..1d4aedbbcf96 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6585,6 +6585,10 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
> > rcu_read_unlock();
> >
> > if (likely(p)) {
> > + if (attr.sched_flags & SCHED_FLAG_KEEP_PARAMS) {
> > + attr.sched_priority = p->rt_priority;
> > + attr.sched_nice = task_nice(p);
> > + }
> > retval = sched_setattr(p, &attr);
> > put_task_struct(p);
> > }
>
> I don't like this much... afaict the KEEP_PARAMS clause in
> __setscheduler() also covers the DL params, and you 'forgot' to copy
> those.
>
> Can't we short circuit the validation logic?

I think we can but I didn't like the look of it, because we end up
sprinkling checks all over the place. KEEP_PARAMS doesn't imply
KEEP_POLICY IIUC, and the policy and params checks are all mixed up.

But maybe that wants fixing too? I guess it could make sense to switch
policies without touching the params in some cases (e.g switching
between FIFO and RR, or BATCH and NORMAL), but I'm not sure what that
would mean for cross-sched_class transitions.

2021-06-11 09:11:09

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS

On Friday 11 Jun 2021 at 08:59:25 (+0000), Quentin Perret wrote:
> On Thursday 10 Jun 2021 at 21:15:45 (+0200), Peter Zijlstra wrote:
> > On Thu, Jun 10, 2021 at 03:13:05PM +0000, Quentin Perret wrote:
> > > SCHED_FLAG_KEEP_PARAMS can be passed to sched_setattr to specify that
> > > the call must not touch scheduling parameters (nice or priority). This
> > > is particularly handy for uclamp when used in conjunction with
> > > SCHED_FLAG_KEEP_POLICY as that allows to issue a syscall that only
> > > impacts uclamp values.
> > >
> > > However, sched_setattr always checks whether the priorities and nice
> > > values passed in sched_attr are valid first, even if those never get
> > > used down the line. This is useless at best since userspace can
> > > trivially bypass this check to set the uclamp values by specifying low
> > > priorities. However, it is cumbersome to do so as there is no single
> > > expression of this that skips both RT and CFS checks at once. As such,
> > > userspace needs to query the task policy first with e.g. sched_getattr
> > > and then set sched_attr.sched_priority accordingly. This is racy and
> > > slower than a single call.
> > >
> > > As the priority and nice checks are useless when SCHED_FLAG_KEEP_PARAMS
> > > is specified, simply inherit them in this case to match the policy
> > > inheritance of SCHED_FLAG_KEEP_POLICY.
> > >
> > > Reported-by: Wei Wang <[email protected]>
> > > Signed-off-by: Quentin Perret <[email protected]>
> > > ---
> > > kernel/sched/core.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 3b213402798e..1d4aedbbcf96 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -6585,6 +6585,10 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
> > > rcu_read_unlock();
> > >
> > > if (likely(p)) {
> > > + if (attr.sched_flags & SCHED_FLAG_KEEP_PARAMS) {
> > > + attr.sched_priority = p->rt_priority;
> > > + attr.sched_nice = task_nice(p);
> > > + }
> > > retval = sched_setattr(p, &attr);
> > > put_task_struct(p);
> > > }
> >
> > I don't like this much... afaict the KEEP_PARAMS clause in
> > __setscheduler() also covers the DL params, and you 'forgot' to copy
> > those.
> >
> > Can't we short circuit the validation logic?
>
> I think we can but I didn't like the look of it, because we end up
> sprinkling checks all over the place. KEEP_PARAMS doesn't imply
> KEEP_POLICY IIUC, and the policy and params checks are all mixed up.
>
> But maybe that wants fixing too? I guess it could make sense to switch
> policies without touching the params in some cases (e.g switching
> between FIFO and RR, or BATCH and NORMAL), but I'm not sure what that
> would mean for cross-sched_class transitions.

Aha, policy transitions are actually blocked in __setscheduler if
KEEP_PARAMS is set, so KEEP_PARAMS does imply KEEP_POLICY. So skipping
the checks might not be too bad, I'll have a go at it.