Ingo,
I tested this one, and I think it makes sense in any case as an
optimization. It should also be good for -stable kernels.
Does it look OK?
Thanks,
Miklos
----
From: Miklos Szeredi <[email protected]>
This patch fixes bug #12208:
http://bugzilla.kernel.org/show_bug.cgi?id=12208
Don't preempt tasks in preempt_schedule() if they are already in the
process of going to sleep. Otherwise the task would wake up only to
go to sleep again.
Due to the way wait_task_inactive() works this can also drastically
slow down ptrace:
- task A is ptracing task B
- task B stops on a trace event
- task A is woken up and preempts task B
- task A calls ptrace on task B, which does ptrace_check_attach()
- this calls wait_task_inactive(), which sees that task B is still on the runq
- task A goes to sleep for a jiffy
- ...
Since UML does lots of the above sequences, those jiffies quickly add
up to make it slow as hell.
Signed-off-by: Miklos Szeredi <[email protected]>
CC: [email protected]
---
kernel/sched.c | 4 ++++
1 file changed, 4 insertions(+)
Index: linux.git/kernel/sched.c
===================================================================
--- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100
+++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100
@@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule
if (likely(ti->preempt_count || irqs_disabled()))
return;
+ /* No point in preempting we are just about to go to sleep. */
+ if (current->state != TASK_RUNNING)
+ return;
+
do {
add_preempt_count(PREEMPT_ACTIVE);
schedule();
On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote:
> Ingo,
>
> I tested this one, and I think it makes sense in any case as an
> optimization. It should also be good for -stable kernels.
>
> Does it look OK?
The idea is good, but there is a risk of preemption latencies here. Some
code paths aren't real quick between setting ->state != TASK_RUNNING and
calling schedule.
[ Both quick: as in O(1) and few instructions ]
So if we're going to do this, we'd need to audit all such code paths --
and there be lots.
The first line of attack for this problem is making wait_task_inactive()
sucks less, which shouldn't be too hard, that unconditional 1 jiffy
sleep is simply retarded.
> Index: linux.git/kernel/sched.c
> ===================================================================
> --- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100
> +++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100
> @@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule
> if (likely(ti->preempt_count || irqs_disabled()))
> return;
>
> + /* No point in preempting we are just about to go to sleep. */
> + if (current->state != TASK_RUNNING)
> + return;
> +
> do {
> add_preempt_count(PREEMPT_ACTIVE);
> schedule();
On Fri, 20 Mar 2009, Peter Zijlstra wrote:
> On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote:
> > Ingo,
> >
> > I tested this one, and I think it makes sense in any case as an
> > optimization. It should also be good for -stable kernels.
> >
> > Does it look OK?
>
> The idea is good, but there is a risk of preemption latencies here. Some
> code paths aren't real quick between setting ->state != TASK_RUNNING and
> calling schedule.
>
> [ Both quick: as in O(1) and few instructions ]
>
> So if we're going to do this, we'd need to audit all such code paths --
> and there be lots.
Oh, yes.
In a random sample the most common pattern is something like this:
spin_lock(&some_lock);
/* do something */
set_task_state(TASK_SOMESLEEP);
/* do something more */
spin_unlock(&some_lock);
schedule();
...
Which should only positively be impacted by the change. But I can
imagine rare cases where it's more complex.
> The first line of attack for this problem is making wait_task_inactive()
> sucks less, which shouldn't be too hard, that unconditional 1 jiffy
> sleep is simply retarded.
I completely agree. However, I'd like to have a non-invasive solution
that can go into current and stable kernels so UML users don't need to
suffer any more.
Thanks,
Miklos
>
> > Index: linux.git/kernel/sched.c
> > ===================================================================
> > --- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100
> > +++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100
> > @@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule
> > if (likely(ti->preempt_count || irqs_disabled()))
> > return;
> >
> > + /* No point in preempting we are just about to go to sleep. */
> > + if (current->state != TASK_RUNNING)
> > + return;
> > +
> > do {
> > add_preempt_count(PREEMPT_ACTIVE);
> > schedule();
>
* Miklos Szeredi <[email protected]> wrote:
> On Fri, 20 Mar 2009, Peter Zijlstra wrote:
> > On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote:
> > > Ingo,
> > >
> > > I tested this one, and I think it makes sense in any case as an
> > > optimization. It should also be good for -stable kernels.
> > >
> > > Does it look OK?
> >
> > The idea is good, but there is a risk of preemption latencies here. Some
> > code paths aren't real quick between setting ->state != TASK_RUNNING and
> > calling schedule.
> >
> > [ Both quick: as in O(1) and few instructions ]
> >
> > So if we're going to do this, we'd need to audit all such code paths --
> > and there be lots.
>
> Oh, yes.
>
> In a random sample the most common pattern is something like this:
>
> spin_lock(&some_lock);
> /* do something */
> set_task_state(TASK_SOMESLEEP);
> /* do something more */
> spin_unlock(&some_lock);
> schedule();
> ...
>
> Which should only positively be impacted by the change. But I can
> imagine rare cases where it's more complex.
I'd suggest spin_unlock_no_resched() and task_unlock_no_resched()
instead of open-coding preempt-disable sequences.
> > The first line of attack for this problem is making
> > wait_task_inactive() sucks less, which shouldn't be too hard,
> > that unconditional 1 jiffy sleep is simply retarded.
>
> I completely agree. However, I'd like to have a non-invasive
> solution that can go into current and stable kernels so UML users
> don't need to suffer any more.
Agreed. task_unlock_no_resched() should do that i think.
Ingo
On Fri, 20 Mar 2009, Ingo Molnar wrote:
> > > The first line of attack for this problem is making
> > > wait_task_inactive() sucks less, which shouldn't be too hard,
> > > that unconditional 1 jiffy sleep is simply retarded.
> >
> > I completely agree. However, I'd like to have a non-invasive
> > solution that can go into current and stable kernels so UML users
> > don't need to suffer any more.
>
> Agreed. task_unlock_no_resched() should do that i think.
I don't see how that would help.
ptrace_stop() specifically would need read_unlock_no_resched(). But
I'm reluctant to add more spinlock functions with all their
variants...
Thanks,
Miklos
* Miklos Szeredi <[email protected]> wrote:
> On Fri, 20 Mar 2009, Ingo Molnar wrote:
> > > > The first line of attack for this problem is making
> > > > wait_task_inactive() sucks less, which shouldn't be too hard,
> > > > that unconditional 1 jiffy sleep is simply retarded.
> > >
> > > I completely agree. However, I'd like to have a non-invasive
> > > solution that can go into current and stable kernels so UML users
> > > don't need to suffer any more.
> >
> > Agreed. task_unlock_no_resched() should do that i think.
>
> I don't see how that would help.
it more clearly expresses the need there, and we already have
_no_resched API variants (we add them on an as-needed basis).
Doing:
preempt_disable();
read_lock();
...
read_unlock();
preempt_enable_no_resched();
Really just open-codes read_unlock_no_resched() and uglifies the
code.
> ptrace_stop() specifically would need read_unlock_no_resched().
> But I'm reluctant to add more spinlock functions with all their
> variants...
if you worry about backportability, we can certainly add the easy
fix too, if it's followed by the more involved fix(es).
Ingo