2011-02-22 21:04:48

by Darren Hart

[permalink] [raw]
Subject: [PATCH 0/2] sched: SCHED_BATCH fixes

The following patches tweak the scheduling behavior surrounding SCHED_BATCH
tasks slightly. While experimenting on ways to improve performance of a build
system I stumbled across what I thought to be oversights in the implementation.
Specifically with respect to rtprio privileges and batch-idle task interaction.

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


2011-02-22 21:04:55

by Darren Hart

[permalink] [raw]
Subject: [PATCH 1/2] sched: allow SCHED_BATCH to preempt SCHED_IDLE tasks

Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
tasks don't preempt idle tasks) so the non-interactive, but still important,
SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.

Signed-off-by: Darren Hart <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Richard Purdie <[email protected]>
---
kernel/sched_fair.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 0c26e2d..ff04bbd 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1857,16 +1857,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
if (test_tsk_need_resched(curr))
return;

+ /* Idle tasks are by definition preempted by non-idle tasks. */
+ if (unlikely(curr->policy == SCHED_IDLE) &&
+ likely(p->policy != SCHED_IDLE))
+ goto preempt;
+
/*
- * Batch and idle tasks do not preempt (their preemption is driven by
- * the tick):
+ * Batch and idle tasks do not preempt non-idle tasks (their preemption
+ * is driven by the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL))
return;

- /* Idle tasks are by definition preempted by everybody. */
- if (unlikely(curr->policy == SCHED_IDLE))
- goto preempt;

if (!sched_feat(WAKEUP_PREEMPT))
return;
--
1.7.1

2011-02-22 21:05:00

by Darren Hart

[permalink] [raw]
Subject: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

As it stands, users with rtprio rlimit permissions can change their policy from
SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
rtprio permission to change out of SCHED_IDLE.

This patch allows the following test-case to pass for users with rtprio
permissions with or without the ifdef block:

int main()
{
int ret;
struct sched_param sp;

/* switch to SCHED_FIFO */
sp.sched_priority = 1;
ret = sched_setscheduler(0, SCHED_FIFO, &sp);
printf("setscheduler FIFO: %d\n", ret);
if (ret) return ret;

/* switch to SCHED_IDLE */
sp.sched_priority = 0;
ret = sched_setscheduler(0, SCHED_IDLE, &sp);
printf("setscheduler IDLE: %d\n", ret);
if (ret) return ret;

/* switch to SCHED_FIFO */
sp.sched_priority = 1;
ret = sched_setscheduler(0, SCHED_FIFO, &sp);
printf("setscheduler FIFO: %d\n", ret);
if (ret) return ret;
/* switch back to SCHED_OTHER */
sp.sched_priority = 0;
ret = sched_setscheduler(0, SCHED_OTHER, &sp);
printf("setscheduler OTHER: %d\n", ret);

return ret;
}

Signed-off-by: Darren Hart <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Richard Purdie <[email protected]>
---
kernel/sched.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 18d38e4..ec6943e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4822,12 +4822,16 @@ recheck:
param->sched_priority > rlim_rtprio)
return -EPERM;
}
+
/*
- * Like positive nice levels, dont allow tasks to
- * move out of SCHED_IDLE either:
+ * Like positive nice levels, don't allow tasks to move out of
+ * SCHED_IDLE either. Make an exception if the user can set rt
+ * policy normally.
*/
- if (p->policy == SCHED_IDLE && policy != SCHED_IDLE)
- return -EPERM;
+ if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
+ if (!task_rlimit(p, RLIMIT_RTPRIO))
+ return -EPERM;
+ }

/* can't change other user's priorities */
if (!check_same_owner(p))
--
1.7.1

2011-02-23 04:21:00

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 1/2] sched: allow SCHED_BATCH to preempt SCHED_IDLE tasks

On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
> tasks don't preempt idle tasks) so the non-interactive, but still important,
> SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.

Yeah, that could be construed as a fairness fix for light SCHED_BATCH vs
a heavy SCHED_IDLE. It should lower latencies for both when mixed.

Acked-by: Mike Galbraith <[email protected]>

Nit below.

> Signed-off-by: Darren Hart <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Richard Purdie <[email protected]>
> ---
> kernel/sched_fair.c | 12 +++++++-----
> 1 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 0c26e2d..ff04bbd 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1857,16 +1857,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
> if (test_tsk_need_resched(curr))
> return;
>
> + /* Idle tasks are by definition preempted by non-idle tasks. */
> + if (unlikely(curr->policy == SCHED_IDLE) &&
> + likely(p->policy != SCHED_IDLE))
> + goto preempt;
> +

if (unlikely(curr->policy == SCHED_IDLE && p->policy != curr->policy))
goto preempt;

Looks better to me.

-Mike

2011-02-23 05:32:05

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 1/2] sched: allow SCHED_BATCH to preempt SCHED_IDLE tasks

On Wed, 2011-02-23 at 05:20 +0100, Mike Galbraith wrote:
> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
> > tasks don't preempt idle tasks) so the non-interactive, but still important,
> > SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.
>
> Yeah, that could be construed as a fairness fix for light SCHED_BATCH vs
> a heavy SCHED_IDLE. It should lower latencies for both when mixed.

Hm. Seems SCHED_IDLE _always_ being preempted is a potential terminal
starvation bug though, unless preempt_tick() checks spread to guarantee
that the preempted task will eventually get the CPU back, even in the
face of massive non-idle wakeup driven load.. which it does not. (idle
task holds resource?)

Maybe my imagination has had too much java though.

-Mike

2011-02-23 05:34:16

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH 1/2] sched: allow SCHED_BATCH to preempt SCHED_IDLE tasks

On 02/22/2011 08:20 PM, Mike Galbraith wrote:
> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>> Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
>> tasks don't preempt idle tasks) so the non-interactive, but still important,
>> SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.
>
> Yeah, that could be construed as a fairness fix for light SCHED_BATCH vs
> a heavy SCHED_IDLE. It should lower latencies for both when mixed.
>
> Acked-by: Mike Galbraith<[email protected]>
>
> Nit below.
>
>> Signed-off-by: Darren Hart<[email protected]>
>> CC: Peter Zijlstra<[email protected]>
>> CC: Ingo Molnar<[email protected]>
>> CC: Richard Purdie<[email protected]>
>> ---
>> kernel/sched_fair.c | 12 +++++++-----
>> 1 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index 0c26e2d..ff04bbd 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -1857,16 +1857,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
>> if (test_tsk_need_resched(curr))
>> return;
>>
>> + /* Idle tasks are by definition preempted by non-idle tasks. */
>> + if (unlikely(curr->policy == SCHED_IDLE)&&
>> + likely(p->policy != SCHED_IDLE))
>> + goto preempt;
>> +
>
> if (unlikely(curr->policy == SCHED_IDLE&& p->policy != curr->policy))
> goto preempt;
>
> Looks better to me.

I have no opinion on the unlikely/likely optimizations. I chose the way
I did as I thought it was more consistent with the existing code. I'll
leave that to Peter and Ingo - let me know if I should resend.

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel

2011-02-23 11:03:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> As it stands, users with rtprio rlimit permissions can change their policy from
> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> rtprio permission to change out of SCHED_IDLE.
>

Ingo, can you remember the rationale for this?

The fact is that SCHED_IDLE is very near nice-20, and we can do:

peterz@twins:~$ renice 5 -p $$
1867: old priority 0, new priority 5
peterz@twins:~$ renice 0 -p $$
1867: old priority 5, new priority 0

Which would suggest that we should be able to return to SCHED_OTHER
RLIMIT_NICE-20.

Hmm?

2011-02-23 11:14:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy


* Peter Zijlstra <[email protected]> wrote:

> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > As it stands, users with rtprio rlimit permissions can change their policy from
> > SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> > to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> > in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> > rtprio permission to change out of SCHED_IDLE.
> >
>
> Ingo, can you remember the rationale for this?
>
> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>
> peterz@twins:~$ renice 5 -p $$
> 1867: old priority 0, new priority 5
> peterz@twins:~$ renice 0 -p $$
> 1867: old priority 5, new priority 0
>
> Which would suggest that we should be able to return to SCHED_OTHER
> RLIMIT_NICE-20.

I dont remember anything subtle there - most likely we just forgot about that spot
when adding RLIMIT_RTPRIO support.

Thanks,

Ingo

2011-02-23 11:17:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > > As it stands, users with rtprio rlimit permissions can change their policy from
> > > SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> > > to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> > > in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> > > rtprio permission to change out of SCHED_IDLE.
> > >
> >
> > Ingo, can you remember the rationale for this?
> >
> > The fact is that SCHED_IDLE is very near nice-20, and we can do:
> >
> > peterz@twins:~$ renice 5 -p $$
> > 1867: old priority 0, new priority 5
> > peterz@twins:~$ renice 0 -p $$
> > 1867: old priority 5, new priority 0
> >
> > Which would suggest that we should be able to return to SCHED_OTHER
> > RLIMIT_NICE-20.
>
> I dont remember anything subtle there - most likely we just forgot about that spot
> when adding RLIMIT_RTPRIO support.

Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
SCHED_FIFO/RR.

2011-02-23 11:35:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy


* Peter Zijlstra <[email protected]> wrote:

> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > > > As it stands, users with rtprio rlimit permissions can change their policy from
> > > > SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> > > > to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> > > > in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> > > > rtprio permission to change out of SCHED_IDLE.
> > > >
> > >
> > > Ingo, can you remember the rationale for this?
> > >
> > > The fact is that SCHED_IDLE is very near nice-20, and we can do:
> > >
> > > peterz@twins:~$ renice 5 -p $$
> > > 1867: old priority 0, new priority 5
> > > peterz@twins:~$ renice 0 -p $$
> > > 1867: old priority 5, new priority 0
> > >
> > > Which would suggest that we should be able to return to SCHED_OTHER
> > > RLIMIT_NICE-20.
> >
> > I dont remember anything subtle there - most likely we just forgot about that spot
> > when adding RLIMIT_RTPRIO support.
>
> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based on
> RLIMIT_NICE, it is after all a change to SCHED_OTHER, not SCHED_FIFO/RR.

Sure. We just went for the most restrictive conditions - it's hard to add security
holes that way ;-)

Ingo

2011-02-23 15:52:42

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
>> * Peter Zijlstra<[email protected]> wrote:
>>
>>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>>>> As it stands, users with rtprio rlimit permissions can change their policy from
>>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
>>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
>>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
>>>> rtprio permission to change out of SCHED_IDLE.
>>>>
>>>
>>> Ingo, can you remember the rationale for this?
>>>
>>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>>>
>>> peterz@twins:~$ renice 5 -p $$
>>> 1867: old priority 0, new priority 5
>>> peterz@twins:~$ renice 0 -p $$
>>> 1867: old priority 5, new priority 0
>>>
>>> Which would suggest that we should be able to return to SCHED_OTHER
>>> RLIMIT_NICE-20.
>>
>> I dont remember anything subtle there - most likely we just forgot about that spot
>> when adding RLIMIT_RTPRIO support.
>
> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
> on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
> SCHED_FIFO/RR.

So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ? The reason I keep
coming back to RTPRIO is it allows the user to change to
SCHED_(FIFO|RR), and from there they can change to anything they want -
so why force two steps? Perhaps the argument is to keep the meaning of
the RLIMITs precise, and if you want to go from IDLE->OTHER you had
better properly set RLIMIT_NICE - maybe I just convinced myself.

Shall I respin the patch to reflect that?


--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel

2011-02-23 16:00:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

On Wed, 2011-02-23 at 07:52 -0800, Darren Hart wrote:
> On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
> > On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
> >> * Peter Zijlstra<[email protected]> wrote:
> >>
> >>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> >>>> As it stands, users with rtprio rlimit permissions can change their policy from
> >>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> >>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> >>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> >>>> rtprio permission to change out of SCHED_IDLE.
> >>>>
> >>>
> >>> Ingo, can you remember the rationale for this?
> >>>
> >>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
> >>>
> >>> peterz@twins:~$ renice 5 -p $$
> >>> 1867: old priority 0, new priority 5
> >>> peterz@twins:~$ renice 0 -p $$
> >>> 1867: old priority 5, new priority 0
> >>>
> >>> Which would suggest that we should be able to return to SCHED_OTHER
> >>> RLIMIT_NICE-20.
> >>
> >> I dont remember anything subtle there - most likely we just forgot about that spot
> >> when adding RLIMIT_RTPRIO support.
> >
> > Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
> > on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
> > SCHED_FIFO/RR.
>
> So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ?

Just RLIMIT_NICE I think.

> The reason I keep
> coming back to RTPRIO is it allows the user to change to
> SCHED_(FIFO|RR), and from there they can change to anything they want -

Hmm,. is that so? I would think that even if you're SCHED_FIFO changing
back to SCHED_OTHER ought to make you respect RLIMIT_NICE.

That is, even if you're a SCHED_FIFO-1 task due to RLIMIT_RTPRIO, when
you switch back to SCHED_OTHER I would expect you not to be able to
switch to a lower nice than RLIMIT_NICE-20.

Now, if this isn't actually so I think we ought to make it so.

> so why force two steps? Perhaps the argument is to keep the meaning of
> the RLIMITs precise, and if you want to go from IDLE->OTHER you had
> better properly set RLIMIT_NICE - maybe I just convinced myself.

Right

> Shall I respin the patch to reflect that?

Please.

2011-02-23 16:07:33

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

On 02/23/2011 08:00 AM, Peter Zijlstra wrote:
> On Wed, 2011-02-23 at 07:52 -0800, Darren Hart wrote:
>> On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
>>> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
>>>> * Peter Zijlstra<[email protected]> wrote:
>>>>
>>>>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>>>>>> As it stands, users with rtprio rlimit permissions can change their policy from
>>>>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
>>>>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
>>>>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
>>>>>> rtprio permission to change out of SCHED_IDLE.
>>>>>>
>>>>>
>>>>> Ingo, can you remember the rationale for this?
>>>>>
>>>>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>>>>>
>>>>> peterz@twins:~$ renice 5 -p $$
>>>>> 1867: old priority 0, new priority 5
>>>>> peterz@twins:~$ renice 0 -p $$
>>>>> 1867: old priority 5, new priority 0
>>>>>
>>>>> Which would suggest that we should be able to return to SCHED_OTHER
>>>>> RLIMIT_NICE-20.
>>>>
>>>> I dont remember anything subtle there - most likely we just forgot about that spot
>>>> when adding RLIMIT_RTPRIO support.
>>>
>>> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
>>> on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
>>> SCHED_FIFO/RR.
>>
>> So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ?
>
> Just RLIMIT_NICE I think.
>
>> The reason I keep
>> coming back to RTPRIO is it allows the user to change to
>> SCHED_(FIFO|RR), and from there they can change to anything they want -
>
> Hmm,. is that so? I would think that even if you're SCHED_FIFO changing
> back to SCHED_OTHER ought to make you respect RLIMIT_NICE.
>
> That is, even if you're a SCHED_FIFO-1 task due to RLIMIT_RTPRIO, when
> you switch back to SCHED_OTHER I would expect you not to be able to
> switch to a lower nice than RLIMIT_NICE-20.
>
> Now, if this isn't actually so I think we ought to make it so.

I was just thinking in terms of POLICY, not priority or nice level. I'll
do some experimenting and see where the limits are.

>
>> so why force two steps? Perhaps the argument is to keep the meaning of
>> the RLIMITs precise, and if you want to go from IDLE->OTHER you had
>> better properly set RLIMIT_NICE - maybe I just convinced myself.
>
> Right
>
>> Shall I respin the patch to reflect that?
>
> Please.

Will do.

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel

2011-02-23 21:28:17

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

On 02/23/2011 08:00 AM, Peter Zijlstra wrote:
> On Wed, 2011-02-23 at 07:52 -0800, Darren Hart wrote:
>> On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
>>> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
>>>> * Peter Zijlstra<[email protected]> wrote:
>>>>
>>>>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>>>>>> As it stands, users with rtprio rlimit permissions can change their policy from
>>>>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
>>>>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
>>>>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
>>>>>> rtprio permission to change out of SCHED_IDLE.
>>>>>>
>>>>>
>>>>> Ingo, can you remember the rationale for this?
>>>>>
>>>>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>>>>>
>>>>> peterz@twins:~$ renice 5 -p $$
>>>>> 1867: old priority 0, new priority 5
>>>>> peterz@twins:~$ renice 0 -p $$
>>>>> 1867: old priority 5, new priority 0
>>>>>
>>>>> Which would suggest that we should be able to return to SCHED_OTHER
>>>>> RLIMIT_NICE-20.
>>>>
>>>> I dont remember anything subtle there - most likely we just forgot about that spot
>>>> when adding RLIMIT_RTPRIO support.
>>>
>>> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
>>> on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
>>> SCHED_FIFO/RR.
>>
>> So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ?
>
> Just RLIMIT_NICE I think.

Agreed.

>
>> The reason I keep
>> coming back to RTPRIO is it allows the user to change to
>> SCHED_(FIFO|RR), and from there they can change to anything they want -
>
> Hmm,. is that so? I would think that even if you're SCHED_FIFO changing
> back to SCHED_OTHER ought to make you respect RLIMIT_NICE.

You are correct, no gaps here.

>
> That is, even if you're a SCHED_FIFO-1 task due to RLIMIT_RTPRIO, when
> you switch back to SCHED_OTHER I would expect you not to be able to
> switch to a lower nice than RLIMIT_NICE-20.
>
> Now, if this isn't actually so I think we ought to make it so.
>
>> so why force two steps? Perhaps the argument is to keep the meaning of
>> the RLIMITs precise, and if you want to go from IDLE->OTHER you had
>> better properly set RLIMIT_NICE - maybe I just convinced myself.
>
> Right
>
>> Shall I respin the patch to reflect that?
>
> Please.

How about this:


>From b66e1a5b1e61476c1af0095f16c666b94664f7f0 Mon Sep 17 00:00:00 2001
From: Darren Hart <[email protected]>
Date: Thu, 17 Feb 2011 15:37:07 -0800
Subject: [PATCH] sched: allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy

The current scheduler implementation returns -EPERM when trying to change from
SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE is considered to be
equivalent to a nice 20, changing to another policy should be allowed provided
the RLIMIT_NICE is accounted for.

This patch allows the following test-case to pass with RLIMIT_NICE=40, but still
fail with RLIMIT_NICE=10 when the calling process is run from a typical shell
(nice 0, or 20 in rlimit terms).

int main()
{
int ret;
struct sched_param sp;
sp.sched_priority = 0;

/* switch to SCHED_IDLE */
ret = sched_setscheduler(0, SCHED_IDLE, &sp);
printf("setscheduler IDLE: %d\n", ret);
if (ret) return ret;

/* switch back to SCHED_OTHER */
ret = sched_setscheduler(0, SCHED_OTHER, &sp);
printf("setscheduler OTHER: %d\n", ret);

return ret;
}

$ ulimit -e
40
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: 0

$ ulimit -e 10
$ ulimit -e
10
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: -1

Signed-off-by: Darren Hart <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Richard Purdie <[email protected]>
---
kernel/sched.c | 11 +++++++----
1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 18d38e4..9bf6284 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4822,12 +4822,15 @@ recheck:
param->sched_priority > rlim_rtprio)
return -EPERM;
}
+
/*
- * Like positive nice levels, dont allow tasks to
- * move out of SCHED_IDLE either:
+ * Treat SCHED_IDLE as nice 20. Only allow a switch to
+ * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
*/
- if (p->policy == SCHED_IDLE && policy != SCHED_IDLE)
- return -EPERM;
+ if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
+ if (!can_nice(p, TASK_NICE(p)))
+ return -EPERM;
+ }

/* can't change other user's priorities */
if (!check_same_owner(p))
--
1.7.1


--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel

2011-02-24 11:49:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched: allow users with rtprio rlimit to change from SCHED_IDLE policy

On Wed, 2011-02-23 at 13:28 -0800, Darren Hart wrote:
> From: Darren Hart <[email protected]>
> Date: Thu, 17 Feb 2011 15:37:07 -0800
> Subject: [PATCH] sched: allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
>
> The current scheduler implementation returns -EPERM when trying to change from
> SCHED_IDLE to SCHED_OTHER or SCHED_BATCH.


> Since SCHED_IDLE is considered to be
> equivalent to a nice 20,

Well, its not quite equivalent, its actually 5 times lighter still and
the preemption behaviour is slightly different as you've found ;-)

> changing to another policy should be allowed provided
> the RLIMIT_NICE is accounted for.
>
> This patch allows the following test-case to pass with RLIMIT_NICE=40, but still
> fail with RLIMIT_NICE=10 when the calling process is run from a typical shell
> (nice 0, or 20 in rlimit terms).

>
> Signed-off-by: Darren Hart <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Richard Purdie <[email protected]>
> ---
> kernel/sched.c | 11 +++++++----
> 1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 18d38e4..9bf6284 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4822,12 +4822,15 @@ recheck:
> param->sched_priority > rlim_rtprio)
> return -EPERM;
> }
> +
> /*
> + * Treat SCHED_IDLE as nice 20. Only allow a switch to
> + * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
> */
> + if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
> + if (!can_nice(p, TASK_NICE(p)))
> + return -EPERM;
> + }
>
> /* can't change other user's priorities */
> if (!check_same_owner(p))

Looks fine, thanks!

2011-03-04 11:50:01

by Darren Hart

[permalink] [raw]
Subject: [tip:sched/core] sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks

Commit-ID: a2f5c9ab79f78e8b91ac993e0543d65b661dd19b
Gitweb: http://git.kernel.org/tip/a2f5c9ab79f78e8b91ac993e0543d65b661dd19b
Author: Darren Hart <[email protected]>
AuthorDate: Tue, 22 Feb 2011 13:04:33 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Mar 2011 11:14:29 +0100

sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks

Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and
ensure idle tasks don't preempt idle tasks) so the non-interactive,
but still important, SCHED_BATCH tasks will run in favor of the very
low priority SCHED_IDLE tasks.

Signed-off-by: Darren Hart <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Mike Galbraith <[email protected]>
Cc: Richard Purdie <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched_fair.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 3a88dee..1438e13 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1870,16 +1870,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
if (test_tsk_need_resched(curr))
return;

+ /* Idle tasks are by definition preempted by non-idle tasks. */
+ if (unlikely(curr->policy == SCHED_IDLE) &&
+ likely(p->policy != SCHED_IDLE))
+ goto preempt;
+
/*
- * Batch and idle tasks do not preempt (their preemption is driven by
- * the tick):
+ * Batch and idle tasks do not preempt non-idle tasks (their preemption
+ * is driven by the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL))
return;

- /* Idle tasks are by definition preempted by everybody. */
- if (unlikely(curr->policy == SCHED_IDLE))
- goto preempt;

if (!sched_feat(WAKEUP_PREEMPT))
return;

2011-03-04 11:50:20

by Darren Hart

[permalink] [raw]
Subject: [tip:sched/core] sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy

Commit-ID: c02aa73b1d18e43cfd79c2f193b225e84ca497c8
Gitweb: http://git.kernel.org/tip/c02aa73b1d18e43cfd79c2f193b225e84ca497c8
Author: Darren Hart <[email protected]>
AuthorDate: Thu, 17 Feb 2011 15:37:07 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Mar 2011 11:14:30 +0100

sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy

The current scheduler implementation returns -EPERM when trying to
change from SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE
is considered to be a nice 20 on steroids, changing to another policy
should be allowed provided the RLIMIT_NICE is accounted for.

This patch allows the following test-case to pass with RLIMIT_NICE=40,
but still fail with RLIMIT_NICE=10 when the calling process is run
from a typical shell (nice 0, or 20 in rlimit terms).

int main()
{
int ret;
struct sched_param sp;
sp.sched_priority = 0;

/* switch to SCHED_IDLE */
ret = sched_setscheduler(0, SCHED_IDLE, &sp);
printf("setscheduler IDLE: %d\n", ret);
if (ret) return ret;

/* switch back to SCHED_OTHER */
ret = sched_setscheduler(0, SCHED_OTHER, &sp);
printf("setscheduler OTHER: %d\n", ret);

return ret;
}

$ ulimit -e
40
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: 0

$ ulimit -e 10
$ ulimit -e
10
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: -1

Signed-off-by: Darren Hart <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Richard Purdie <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 11 +++++++----
1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0c87126..f303070 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4981,12 +4981,15 @@ recheck:
param->sched_priority > rlim_rtprio)
return -EPERM;
}
+
/*
- * Like positive nice levels, dont allow tasks to
- * move out of SCHED_IDLE either:
+ * Treat SCHED_IDLE as nice 20. Only allow a switch to
+ * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
*/
- if (p->policy == SCHED_IDLE && policy != SCHED_IDLE)
- return -EPERM;
+ if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
+ if (!can_nice(p, TASK_NICE(p)))
+ return -EPERM;
+ }

/* can't change other user's priorities */
if (!check_same_owner(p))