The following patches tweak the scheduling behavior surrounding SCHED_BATCH
tasks slightly. While experimenting on ways to improve performance of a build
system I stumbled across what I thought to be oversights in the implementation.
Specifically with respect to rtprio privileges and batch-idle task interaction.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
tasks don't preempt idle tasks) so the non-interactive, but still important,
SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.
Signed-off-by: Darren Hart <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Richard Purdie <[email protected]>
---
kernel/sched_fair.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 0c26e2d..ff04bbd 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1857,16 +1857,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
if (test_tsk_need_resched(curr))
return;
+ /* Idle tasks are by definition preempted by non-idle tasks. */
+ if (unlikely(curr->policy == SCHED_IDLE) &&
+ likely(p->policy != SCHED_IDLE))
+ goto preempt;
+
/*
- * Batch and idle tasks do not preempt (their preemption is driven by
- * the tick):
+ * Batch and idle tasks do not preempt non-idle tasks (their preemption
+ * is driven by the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL))
return;
- /* Idle tasks are by definition preempted by everybody. */
- if (unlikely(curr->policy == SCHED_IDLE))
- goto preempt;
if (!sched_feat(WAKEUP_PREEMPT))
return;
--
1.7.1
As it stands, users with rtprio rlimit permissions can change their policy from
SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
rtprio permission to change out of SCHED_IDLE.
This patch allows the following test-case to pass for users with rtprio
permissions with or without the ifdef block:
int main()
{
int ret;
struct sched_param sp;
/* switch to SCHED_FIFO */
sp.sched_priority = 1;
ret = sched_setscheduler(0, SCHED_FIFO, &sp);
printf("setscheduler FIFO: %d\n", ret);
if (ret) return ret;
/* switch to SCHED_IDLE */
sp.sched_priority = 0;
ret = sched_setscheduler(0, SCHED_IDLE, &sp);
printf("setscheduler IDLE: %d\n", ret);
if (ret) return ret;
/* switch to SCHED_FIFO */
sp.sched_priority = 1;
ret = sched_setscheduler(0, SCHED_FIFO, &sp);
printf("setscheduler FIFO: %d\n", ret);
if (ret) return ret;
/* switch back to SCHED_OTHER */
sp.sched_priority = 0;
ret = sched_setscheduler(0, SCHED_OTHER, &sp);
printf("setscheduler OTHER: %d\n", ret);
return ret;
}
Signed-off-by: Darren Hart <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Richard Purdie <[email protected]>
---
kernel/sched.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 18d38e4..ec6943e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4822,12 +4822,16 @@ recheck:
param->sched_priority > rlim_rtprio)
return -EPERM;
}
+
/*
- * Like positive nice levels, dont allow tasks to
- * move out of SCHED_IDLE either:
+ * Like positive nice levels, don't allow tasks to move out of
+ * SCHED_IDLE either. Make an exception if the user can set rt
+ * policy normally.
*/
- if (p->policy == SCHED_IDLE && policy != SCHED_IDLE)
- return -EPERM;
+ if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
+ if (!task_rlimit(p, RLIMIT_RTPRIO))
+ return -EPERM;
+ }
/* can't change other user's priorities */
if (!check_same_owner(p))
--
1.7.1
On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
> tasks don't preempt idle tasks) so the non-interactive, but still important,
> SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.
Yeah, that could be construed as a fairness fix for light SCHED_BATCH vs
a heavy SCHED_IDLE. It should lower latencies for both when mixed.
Acked-by: Mike Galbraith <[email protected]>
Nit below.
> Signed-off-by: Darren Hart <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Richard Purdie <[email protected]>
> ---
> kernel/sched_fair.c | 12 +++++++-----
> 1 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 0c26e2d..ff04bbd 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1857,16 +1857,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
> if (test_tsk_need_resched(curr))
> return;
>
> + /* Idle tasks are by definition preempted by non-idle tasks. */
> + if (unlikely(curr->policy == SCHED_IDLE) &&
> + likely(p->policy != SCHED_IDLE))
> + goto preempt;
> +
if (unlikely(curr->policy == SCHED_IDLE && p->policy != curr->policy))
goto preempt;
Looks better to me.
-Mike
On Wed, 2011-02-23 at 05:20 +0100, Mike Galbraith wrote:
> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
> > tasks don't preempt idle tasks) so the non-interactive, but still important,
> > SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.
>
> Yeah, that could be construed as a fairness fix for light SCHED_BATCH vs
> a heavy SCHED_IDLE. It should lower latencies for both when mixed.
Hm. Seems SCHED_IDLE _always_ being preempted is a potential terminal
starvation bug though, unless preempt_tick() checks spread to guarantee
that the preempted task will eventually get the CPU back, even in the
face of massive non-idle wakeup driven load.. which it does not. (idle
task holds resource?)
Maybe my imagination has had too much java though.
-Mike
On 02/22/2011 08:20 PM, Mike Galbraith wrote:
> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>> Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle
>> tasks don't preempt idle tasks) so the non-interactive, but still important,
>> SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks.
>
> Yeah, that could be construed as a fairness fix for light SCHED_BATCH vs
> a heavy SCHED_IDLE. It should lower latencies for both when mixed.
>
> Acked-by: Mike Galbraith<[email protected]>
>
> Nit below.
>
>> Signed-off-by: Darren Hart<[email protected]>
>> CC: Peter Zijlstra<[email protected]>
>> CC: Ingo Molnar<[email protected]>
>> CC: Richard Purdie<[email protected]>
>> ---
>> kernel/sched_fair.c | 12 +++++++-----
>> 1 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index 0c26e2d..ff04bbd 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -1857,16 +1857,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
>> if (test_tsk_need_resched(curr))
>> return;
>>
>> + /* Idle tasks are by definition preempted by non-idle tasks. */
>> + if (unlikely(curr->policy == SCHED_IDLE)&&
>> + likely(p->policy != SCHED_IDLE))
>> + goto preempt;
>> +
>
> if (unlikely(curr->policy == SCHED_IDLE&& p->policy != curr->policy))
> goto preempt;
>
> Looks better to me.
I have no opinion on the unlikely/likely optimizations. I chose the way
I did as I thought it was more consistent with the existing code. I'll
leave that to Peter and Ingo - let me know if I should resend.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> As it stands, users with rtprio rlimit permissions can change their policy from
> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> rtprio permission to change out of SCHED_IDLE.
>
Ingo, can you remember the rationale for this?
The fact is that SCHED_IDLE is very near nice-20, and we can do:
peterz@twins:~$ renice 5 -p $$
1867: old priority 0, new priority 5
peterz@twins:~$ renice 0 -p $$
1867: old priority 5, new priority 0
Which would suggest that we should be able to return to SCHED_OTHER
RLIMIT_NICE-20.
Hmm?
* Peter Zijlstra <[email protected]> wrote:
> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > As it stands, users with rtprio rlimit permissions can change their policy from
> > SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> > to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> > in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> > rtprio permission to change out of SCHED_IDLE.
> >
>
> Ingo, can you remember the rationale for this?
>
> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>
> peterz@twins:~$ renice 5 -p $$
> 1867: old priority 0, new priority 5
> peterz@twins:~$ renice 0 -p $$
> 1867: old priority 5, new priority 0
>
> Which would suggest that we should be able to return to SCHED_OTHER
> RLIMIT_NICE-20.
I dont remember anything subtle there - most likely we just forgot about that spot
when adding RLIMIT_RTPRIO support.
Thanks,
Ingo
On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > > As it stands, users with rtprio rlimit permissions can change their policy from
> > > SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> > > to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> > > in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> > > rtprio permission to change out of SCHED_IDLE.
> > >
> >
> > Ingo, can you remember the rationale for this?
> >
> > The fact is that SCHED_IDLE is very near nice-20, and we can do:
> >
> > peterz@twins:~$ renice 5 -p $$
> > 1867: old priority 0, new priority 5
> > peterz@twins:~$ renice 0 -p $$
> > 1867: old priority 5, new priority 0
> >
> > Which would suggest that we should be able to return to SCHED_OTHER
> > RLIMIT_NICE-20.
>
> I dont remember anything subtle there - most likely we just forgot about that spot
> when adding RLIMIT_RTPRIO support.
Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
SCHED_FIFO/RR.
* Peter Zijlstra <[email protected]> wrote:
> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> > > > As it stands, users with rtprio rlimit permissions can change their policy from
> > > > SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> > > > to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> > > > in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> > > > rtprio permission to change out of SCHED_IDLE.
> > > >
> > >
> > > Ingo, can you remember the rationale for this?
> > >
> > > The fact is that SCHED_IDLE is very near nice-20, and we can do:
> > >
> > > peterz@twins:~$ renice 5 -p $$
> > > 1867: old priority 0, new priority 5
> > > peterz@twins:~$ renice 0 -p $$
> > > 1867: old priority 5, new priority 0
> > >
> > > Which would suggest that we should be able to return to SCHED_OTHER
> > > RLIMIT_NICE-20.
> >
> > I dont remember anything subtle there - most likely we just forgot about that spot
> > when adding RLIMIT_RTPRIO support.
>
> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based on
> RLIMIT_NICE, it is after all a change to SCHED_OTHER, not SCHED_FIFO/RR.
Sure. We just went for the most restrictive conditions - it's hard to add security
holes that way ;-)
Ingo
On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
>> * Peter Zijlstra<[email protected]> wrote:
>>
>>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>>>> As it stands, users with rtprio rlimit permissions can change their policy from
>>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
>>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
>>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
>>>> rtprio permission to change out of SCHED_IDLE.
>>>>
>>>
>>> Ingo, can you remember the rationale for this?
>>>
>>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>>>
>>> peterz@twins:~$ renice 5 -p $$
>>> 1867: old priority 0, new priority 5
>>> peterz@twins:~$ renice 0 -p $$
>>> 1867: old priority 5, new priority 0
>>>
>>> Which would suggest that we should be able to return to SCHED_OTHER
>>> RLIMIT_NICE-20.
>>
>> I dont remember anything subtle there - most likely we just forgot about that spot
>> when adding RLIMIT_RTPRIO support.
>
> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
> on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
> SCHED_FIFO/RR.
So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ? The reason I keep
coming back to RTPRIO is it allows the user to change to
SCHED_(FIFO|RR), and from there they can change to anything they want -
so why force two steps? Perhaps the argument is to keep the meaning of
the RLIMITs precise, and if you want to go from IDLE->OTHER you had
better properly set RLIMIT_NICE - maybe I just convinced myself.
Shall I respin the patch to reflect that?
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
On Wed, 2011-02-23 at 07:52 -0800, Darren Hart wrote:
> On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
> > On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
> >> * Peter Zijlstra<[email protected]> wrote:
> >>
> >>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
> >>>> As it stands, users with rtprio rlimit permissions can change their policy from
> >>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
> >>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
> >>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
> >>>> rtprio permission to change out of SCHED_IDLE.
> >>>>
> >>>
> >>> Ingo, can you remember the rationale for this?
> >>>
> >>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
> >>>
> >>> peterz@twins:~$ renice 5 -p $$
> >>> 1867: old priority 0, new priority 5
> >>> peterz@twins:~$ renice 0 -p $$
> >>> 1867: old priority 5, new priority 0
> >>>
> >>> Which would suggest that we should be able to return to SCHED_OTHER
> >>> RLIMIT_NICE-20.
> >>
> >> I dont remember anything subtle there - most likely we just forgot about that spot
> >> when adding RLIMIT_RTPRIO support.
> >
> > Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
> > on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
> > SCHED_FIFO/RR.
>
> So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ?
Just RLIMIT_NICE I think.
> The reason I keep
> coming back to RTPRIO is it allows the user to change to
> SCHED_(FIFO|RR), and from there they can change to anything they want -
Hmm,. is that so? I would think that even if you're SCHED_FIFO changing
back to SCHED_OTHER ought to make you respect RLIMIT_NICE.
That is, even if you're a SCHED_FIFO-1 task due to RLIMIT_RTPRIO, when
you switch back to SCHED_OTHER I would expect you not to be able to
switch to a lower nice than RLIMIT_NICE-20.
Now, if this isn't actually so I think we ought to make it so.
> so why force two steps? Perhaps the argument is to keep the meaning of
> the RLIMITs precise, and if you want to go from IDLE->OTHER you had
> better properly set RLIMIT_NICE - maybe I just convinced myself.
Right
> Shall I respin the patch to reflect that?
Please.
On 02/23/2011 08:00 AM, Peter Zijlstra wrote:
> On Wed, 2011-02-23 at 07:52 -0800, Darren Hart wrote:
>> On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
>>> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
>>>> * Peter Zijlstra<[email protected]> wrote:
>>>>
>>>>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>>>>>> As it stands, users with rtprio rlimit permissions can change their policy from
>>>>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
>>>>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
>>>>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
>>>>>> rtprio permission to change out of SCHED_IDLE.
>>>>>>
>>>>>
>>>>> Ingo, can you remember the rationale for this?
>>>>>
>>>>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>>>>>
>>>>> peterz@twins:~$ renice 5 -p $$
>>>>> 1867: old priority 0, new priority 5
>>>>> peterz@twins:~$ renice 0 -p $$
>>>>> 1867: old priority 5, new priority 0
>>>>>
>>>>> Which would suggest that we should be able to return to SCHED_OTHER
>>>>> RLIMIT_NICE-20.
>>>>
>>>> I dont remember anything subtle there - most likely we just forgot about that spot
>>>> when adding RLIMIT_RTPRIO support.
>>>
>>> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
>>> on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
>>> SCHED_FIFO/RR.
>>
>> So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ?
>
> Just RLIMIT_NICE I think.
>
>> The reason I keep
>> coming back to RTPRIO is it allows the user to change to
>> SCHED_(FIFO|RR), and from there they can change to anything they want -
>
> Hmm,. is that so? I would think that even if you're SCHED_FIFO changing
> back to SCHED_OTHER ought to make you respect RLIMIT_NICE.
>
> That is, even if you're a SCHED_FIFO-1 task due to RLIMIT_RTPRIO, when
> you switch back to SCHED_OTHER I would expect you not to be able to
> switch to a lower nice than RLIMIT_NICE-20.
>
> Now, if this isn't actually so I think we ought to make it so.
I was just thinking in terms of POLICY, not priority or nice level. I'll
do some experimenting and see where the limits are.
>
>> so why force two steps? Perhaps the argument is to keep the meaning of
>> the RLIMITs precise, and if you want to go from IDLE->OTHER you had
>> better properly set RLIMIT_NICE - maybe I just convinced myself.
>
> Right
>
>> Shall I respin the patch to reflect that?
>
> Please.
Will do.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
On 02/23/2011 08:00 AM, Peter Zijlstra wrote:
> On Wed, 2011-02-23 at 07:52 -0800, Darren Hart wrote:
>> On 02/23/2011 03:17 AM, Peter Zijlstra wrote:
>>> On Wed, 2011-02-23 at 12:13 +0100, Ingo Molnar wrote:
>>>> * Peter Zijlstra<[email protected]> wrote:
>>>>
>>>>> On Tue, 2011-02-22 at 13:04 -0800, Darren Hart wrote:
>>>>>> As it stands, users with rtprio rlimit permissions can change their policy from
>>>>>> SCHED_OTHER to SCHED_FIFO and back. They can change to SCHED_IDLE, but not back
>>>>>> to SCHED_FIFO. If they have the rtprio permission, they should be able to. Once
>>>>>> in SCHED_FIFO, they could go back to SCHED_OTHER. This patch allows users with
>>>>>> rtprio permission to change out of SCHED_IDLE.
>>>>>>
>>>>>
>>>>> Ingo, can you remember the rationale for this?
>>>>>
>>>>> The fact is that SCHED_IDLE is very near nice-20, and we can do:
>>>>>
>>>>> peterz@twins:~$ renice 5 -p $$
>>>>> 1867: old priority 0, new priority 5
>>>>> peterz@twins:~$ renice 0 -p $$
>>>>> 1867: old priority 5, new priority 0
>>>>>
>>>>> Which would suggest that we should be able to return to SCHED_OTHER
>>>>> RLIMIT_NICE-20.
>>>>
>>>> I dont remember anything subtle there - most likely we just forgot about that spot
>>>> when adding RLIMIT_RTPRIO support.
>>>
>>> Ah, I was arguing we should allow it regardless of RLIMIT_RTPRIO, based
>>> on RLIMIT_NICE, it is after all a change to SCHED_OTHER, not
>>> SCHED_FIFO/RR.
>>
>> So we need an OR test of RLIMIT_NICE | RLIMIT_RTPRIO ?
>
> Just RLIMIT_NICE I think.
Agreed.
>
>> The reason I keep
>> coming back to RTPRIO is it allows the user to change to
>> SCHED_(FIFO|RR), and from there they can change to anything they want -
>
> Hmm,. is that so? I would think that even if you're SCHED_FIFO changing
> back to SCHED_OTHER ought to make you respect RLIMIT_NICE.
You are correct, no gaps here.
>
> That is, even if you're a SCHED_FIFO-1 task due to RLIMIT_RTPRIO, when
> you switch back to SCHED_OTHER I would expect you not to be able to
> switch to a lower nice than RLIMIT_NICE-20.
>
> Now, if this isn't actually so I think we ought to make it so.
>
>> so why force two steps? Perhaps the argument is to keep the meaning of
>> the RLIMITs precise, and if you want to go from IDLE->OTHER you had
>> better properly set RLIMIT_NICE - maybe I just convinced myself.
>
> Right
>
>> Shall I respin the patch to reflect that?
>
> Please.
How about this:
>From b66e1a5b1e61476c1af0095f16c666b94664f7f0 Mon Sep 17 00:00:00 2001
From: Darren Hart <[email protected]>
Date: Thu, 17 Feb 2011 15:37:07 -0800
Subject: [PATCH] sched: allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
The current scheduler implementation returns -EPERM when trying to change from
SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE is considered to be
equivalent to a nice 20, changing to another policy should be allowed provided
the RLIMIT_NICE is accounted for.
This patch allows the following test-case to pass with RLIMIT_NICE=40, but still
fail with RLIMIT_NICE=10 when the calling process is run from a typical shell
(nice 0, or 20 in rlimit terms).
int main()
{
int ret;
struct sched_param sp;
sp.sched_priority = 0;
/* switch to SCHED_IDLE */
ret = sched_setscheduler(0, SCHED_IDLE, &sp);
printf("setscheduler IDLE: %d\n", ret);
if (ret) return ret;
/* switch back to SCHED_OTHER */
ret = sched_setscheduler(0, SCHED_OTHER, &sp);
printf("setscheduler OTHER: %d\n", ret);
return ret;
}
$ ulimit -e
40
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: 0
$ ulimit -e 10
$ ulimit -e
10
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: -1
Signed-off-by: Darren Hart <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Richard Purdie <[email protected]>
---
kernel/sched.c | 11 +++++++----
1 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 18d38e4..9bf6284 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4822,12 +4822,15 @@ recheck:
param->sched_priority > rlim_rtprio)
return -EPERM;
}
+
/*
- * Like positive nice levels, dont allow tasks to
- * move out of SCHED_IDLE either:
+ * Treat SCHED_IDLE as nice 20. Only allow a switch to
+ * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
*/
- if (p->policy == SCHED_IDLE && policy != SCHED_IDLE)
- return -EPERM;
+ if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
+ if (!can_nice(p, TASK_NICE(p)))
+ return -EPERM;
+ }
/* can't change other user's priorities */
if (!check_same_owner(p))
--
1.7.1
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
On Wed, 2011-02-23 at 13:28 -0800, Darren Hart wrote:
> From: Darren Hart <[email protected]>
> Date: Thu, 17 Feb 2011 15:37:07 -0800
> Subject: [PATCH] sched: allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
>
> The current scheduler implementation returns -EPERM when trying to change from
> SCHED_IDLE to SCHED_OTHER or SCHED_BATCH.
> Since SCHED_IDLE is considered to be
> equivalent to a nice 20,
Well, its not quite equivalent, its actually 5 times lighter still and
the preemption behaviour is slightly different as you've found ;-)
> changing to another policy should be allowed provided
> the RLIMIT_NICE is accounted for.
>
> This patch allows the following test-case to pass with RLIMIT_NICE=40, but still
> fail with RLIMIT_NICE=10 when the calling process is run from a typical shell
> (nice 0, or 20 in rlimit terms).
>
> Signed-off-by: Darren Hart <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Richard Purdie <[email protected]>
> ---
> kernel/sched.c | 11 +++++++----
> 1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 18d38e4..9bf6284 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4822,12 +4822,15 @@ recheck:
> param->sched_priority > rlim_rtprio)
> return -EPERM;
> }
> +
> /*
> + * Treat SCHED_IDLE as nice 20. Only allow a switch to
> + * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
> */
> + if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
> + if (!can_nice(p, TASK_NICE(p)))
> + return -EPERM;
> + }
>
> /* can't change other user's priorities */
> if (!check_same_owner(p))
Looks fine, thanks!
Commit-ID: a2f5c9ab79f78e8b91ac993e0543d65b661dd19b
Gitweb: http://git.kernel.org/tip/a2f5c9ab79f78e8b91ac993e0543d65b661dd19b
Author: Darren Hart <[email protected]>
AuthorDate: Tue, 22 Feb 2011 13:04:33 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Mar 2011 11:14:29 +0100
sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and
ensure idle tasks don't preempt idle tasks) so the non-interactive,
but still important, SCHED_BATCH tasks will run in favor of the very
low priority SCHED_IDLE tasks.
Signed-off-by: Darren Hart <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Mike Galbraith <[email protected]>
Cc: Richard Purdie <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched_fair.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 3a88dee..1438e13 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1870,16 +1870,18 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
if (test_tsk_need_resched(curr))
return;
+ /* Idle tasks are by definition preempted by non-idle tasks. */
+ if (unlikely(curr->policy == SCHED_IDLE) &&
+ likely(p->policy != SCHED_IDLE))
+ goto preempt;
+
/*
- * Batch and idle tasks do not preempt (their preemption is driven by
- * the tick):
+ * Batch and idle tasks do not preempt non-idle tasks (their preemption
+ * is driven by the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL))
return;
- /* Idle tasks are by definition preempted by everybody. */
- if (unlikely(curr->policy == SCHED_IDLE))
- goto preempt;
if (!sched_feat(WAKEUP_PREEMPT))
return;
Commit-ID: c02aa73b1d18e43cfd79c2f193b225e84ca497c8
Gitweb: http://git.kernel.org/tip/c02aa73b1d18e43cfd79c2f193b225e84ca497c8
Author: Darren Hart <[email protected]>
AuthorDate: Thu, 17 Feb 2011 15:37:07 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Mar 2011 11:14:30 +0100
sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
The current scheduler implementation returns -EPERM when trying to
change from SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE
is considered to be a nice 20 on steroids, changing to another policy
should be allowed provided the RLIMIT_NICE is accounted for.
This patch allows the following test-case to pass with RLIMIT_NICE=40,
but still fail with RLIMIT_NICE=10 when the calling process is run
from a typical shell (nice 0, or 20 in rlimit terms).
int main()
{
int ret;
struct sched_param sp;
sp.sched_priority = 0;
/* switch to SCHED_IDLE */
ret = sched_setscheduler(0, SCHED_IDLE, &sp);
printf("setscheduler IDLE: %d\n", ret);
if (ret) return ret;
/* switch back to SCHED_OTHER */
ret = sched_setscheduler(0, SCHED_OTHER, &sp);
printf("setscheduler OTHER: %d\n", ret);
return ret;
}
$ ulimit -e
40
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: 0
$ ulimit -e 10
$ ulimit -e
10
$ ./test
setscheduler IDLE: 0
setscheduler OTHER: -1
Signed-off-by: Darren Hart <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Richard Purdie <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 11 +++++++----
1 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 0c87126..f303070 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4981,12 +4981,15 @@ recheck:
param->sched_priority > rlim_rtprio)
return -EPERM;
}
+
/*
- * Like positive nice levels, dont allow tasks to
- * move out of SCHED_IDLE either:
+ * Treat SCHED_IDLE as nice 20. Only allow a switch to
+ * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
*/
- if (p->policy == SCHED_IDLE && policy != SCHED_IDLE)
- return -EPERM;
+ if (p->policy == SCHED_IDLE && policy != SCHED_IDLE) {
+ if (!can_nice(p, TASK_NICE(p)))
+ return -EPERM;
+ }
/* can't change other user's priorities */
if (!check_same_owner(p))