2009-03-20 09:45:00

by Miklos Szeredi

[permalink] [raw]
Subject: [patch] don't preempt not TASK_RUNNING tasks

Ingo,

I tested this one, and I think it makes sense in any case as an
optimization. It should also be good for -stable kernels.

Does it look OK?

Thanks,
Miklos
----

From: Miklos Szeredi <[email protected]>

This patch fixes bug #12208:

http://bugzilla.kernel.org/show_bug.cgi?id=12208

Don't preempt tasks in preempt_schedule() if they are already in the
process of going to sleep. Otherwise the task would wake up only to
go to sleep again.

Due to the way wait_task_inactive() works this can also drastically
slow down ptrace:

- task A is ptracing task B
- task B stops on a trace event
- task A is woken up and preempts task B
- task A calls ptrace on task B, which does ptrace_check_attach()
- this calls wait_task_inactive(), which sees that task B is still on the runq
- task A goes to sleep for a jiffy
- ...

Since UML does lots of the above sequences, those jiffies quickly add
up to make it slow as hell.

Signed-off-by: Miklos Szeredi <[email protected]>
CC: [email protected]
---
kernel/sched.c | 4 ++++
1 file changed, 4 insertions(+)

Index: linux.git/kernel/sched.c
===================================================================
--- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100
+++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100
@@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule
if (likely(ti->preempt_count || irqs_disabled()))
return;

+ /* No point in preempting we are just about to go to sleep. */
+ if (current->state != TASK_RUNNING)
+ return;
+
do {
add_preempt_count(PREEMPT_ACTIVE);
schedule();


2009-03-20 10:12:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch] don't preempt not TASK_RUNNING tasks

On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote:
> Ingo,
>
> I tested this one, and I think it makes sense in any case as an
> optimization. It should also be good for -stable kernels.
>
> Does it look OK?

The idea is good, but there is a risk of preemption latencies here. Some
code paths aren't real quick between setting ->state != TASK_RUNNING and
calling schedule.

[ Both quick: as in O(1) and few instructions ]

So if we're going to do this, we'd need to audit all such code paths --
and there be lots.

The first line of attack for this problem is making wait_task_inactive()
sucks less, which shouldn't be too hard, that unconditional 1 jiffy
sleep is simply retarded.

> Index: linux.git/kernel/sched.c
> ===================================================================
> --- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100
> +++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100
> @@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule
> if (likely(ti->preempt_count || irqs_disabled()))
> return;
>
> + /* No point in preempting we are just about to go to sleep. */
> + if (current->state != TASK_RUNNING)
> + return;
> +
> do {
> add_preempt_count(PREEMPT_ACTIVE);
> schedule();

2009-03-20 10:38:24

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch] don't preempt not TASK_RUNNING tasks

On Fri, 20 Mar 2009, Peter Zijlstra wrote:
> On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote:
> > Ingo,
> >
> > I tested this one, and I think it makes sense in any case as an
> > optimization. It should also be good for -stable kernels.
> >
> > Does it look OK?
>
> The idea is good, but there is a risk of preemption latencies here. Some
> code paths aren't real quick between setting ->state != TASK_RUNNING and
> calling schedule.
>
> [ Both quick: as in O(1) and few instructions ]
>
> So if we're going to do this, we'd need to audit all such code paths --
> and there be lots.

Oh, yes.

In a random sample the most common pattern is something like this:

spin_lock(&some_lock);
/* do something */
set_task_state(TASK_SOMESLEEP);
/* do something more */
spin_unlock(&some_lock);
schedule();
...

Which should only positively be impacted by the change. But I can
imagine rare cases where it's more complex.

> The first line of attack for this problem is making wait_task_inactive()
> sucks less, which shouldn't be too hard, that unconditional 1 jiffy
> sleep is simply retarded.

I completely agree. However, I'd like to have a non-invasive solution
that can go into current and stable kernels so UML users don't need to
suffer any more.

Thanks,
Miklos

>
> > Index: linux.git/kernel/sched.c
> > ===================================================================
> > --- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100
> > +++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100
> > @@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule
> > if (likely(ti->preempt_count || irqs_disabled()))
> > return;
> >
> > + /* No point in preempting we are just about to go to sleep. */
> > + if (current->state != TASK_RUNNING)
> > + return;
> > +
> > do {
> > add_preempt_count(PREEMPT_ACTIVE);
> > schedule();
>

2009-03-20 10:54:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] don't preempt not TASK_RUNNING tasks


* Miklos Szeredi <[email protected]> wrote:

> On Fri, 20 Mar 2009, Peter Zijlstra wrote:
> > On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote:
> > > Ingo,
> > >
> > > I tested this one, and I think it makes sense in any case as an
> > > optimization. It should also be good for -stable kernels.
> > >
> > > Does it look OK?
> >
> > The idea is good, but there is a risk of preemption latencies here. Some
> > code paths aren't real quick between setting ->state != TASK_RUNNING and
> > calling schedule.
> >
> > [ Both quick: as in O(1) and few instructions ]
> >
> > So if we're going to do this, we'd need to audit all such code paths --
> > and there be lots.
>
> Oh, yes.
>
> In a random sample the most common pattern is something like this:
>
> spin_lock(&some_lock);
> /* do something */
> set_task_state(TASK_SOMESLEEP);
> /* do something more */
> spin_unlock(&some_lock);
> schedule();
> ...
>
> Which should only positively be impacted by the change. But I can
> imagine rare cases where it's more complex.

I'd suggest spin_unlock_no_resched() and task_unlock_no_resched()
instead of open-coding preempt-disable sequences.

> > The first line of attack for this problem is making
> > wait_task_inactive() sucks less, which shouldn't be too hard,
> > that unconditional 1 jiffy sleep is simply retarded.
>
> I completely agree. However, I'd like to have a non-invasive
> solution that can go into current and stable kernels so UML users
> don't need to suffer any more.

Agreed. task_unlock_no_resched() should do that i think.

Ingo

2009-03-20 11:26:49

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch] don't preempt not TASK_RUNNING tasks

On Fri, 20 Mar 2009, Ingo Molnar wrote:
> > > The first line of attack for this problem is making
> > > wait_task_inactive() sucks less, which shouldn't be too hard,
> > > that unconditional 1 jiffy sleep is simply retarded.
> >
> > I completely agree. However, I'd like to have a non-invasive
> > solution that can go into current and stable kernels so UML users
> > don't need to suffer any more.
>
> Agreed. task_unlock_no_resched() should do that i think.

I don't see how that would help.

ptrace_stop() specifically would need read_unlock_no_resched(). But
I'm reluctant to add more spinlock functions with all their
variants...

Thanks,
Miklos

2009-03-20 11:40:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] don't preempt not TASK_RUNNING tasks


* Miklos Szeredi <[email protected]> wrote:

> On Fri, 20 Mar 2009, Ingo Molnar wrote:
> > > > The first line of attack for this problem is making
> > > > wait_task_inactive() sucks less, which shouldn't be too hard,
> > > > that unconditional 1 jiffy sleep is simply retarded.
> > >
> > > I completely agree. However, I'd like to have a non-invasive
> > > solution that can go into current and stable kernels so UML users
> > > don't need to suffer any more.
> >
> > Agreed. task_unlock_no_resched() should do that i think.
>
> I don't see how that would help.

it more clearly expresses the need there, and we already have
_no_resched API variants (we add them on an as-needed basis).

Doing:

preempt_disable();
read_lock();
...
read_unlock();
preempt_enable_no_resched();

Really just open-codes read_unlock_no_resched() and uglifies the
code.

> ptrace_stop() specifically would need read_unlock_no_resched().
> But I'm reluctant to add more spinlock functions with all their
> variants...

if you worry about backportability, we can certainly add the easy
fix too, if it's followed by the more involved fix(es).

Ingo