2009-03-19 22:25:14

by Miklos Szeredi

[permalink] [raw]
Subject: [patch] fix uml slowness caused by ptrace preemption bug on host

From: Miklos Szeredi <[email protected]>

This patch fixes bug #12208.

This turned out to be not a scheduler regression, but an already
existing problem in ptrace being triggered by subtle scheduler
changes.

The problem is this:

- task A is ptracing task B
- task B stops on a trace event
- task A is woken up and preempts task B
- task A calls ptrace on task B, which does ptrace_check_attach()
- this calls wait_task_inactive(), which sees that task B is still on the runq
- task A goes to sleep for a jiffy
- ...

Since UML does lots of the above sequences, those jiffies quickly add
up to make it slow as hell.

This patch solves this by not scheduling on preempt_enable() after
ptrace_stop() has woken up the tracer.

Signed-off-by: Miklos Szeredi <[email protected]>
---
kernel/signal.c | 7 +++++++
1 file changed, 7 insertions(+)

Index: linux.git/kernel/signal.c
===================================================================
--- linux.git.orig/kernel/signal.c 2009-03-19 22:48:59.000000000 +0100
+++ linux.git/kernel/signal.c 2009-03-19 22:54:22.000000000 +0100
@@ -1572,10 +1572,16 @@ static void ptrace_stop(int exit_code, i
/* Let the debugger run. */
__set_current_state(TASK_TRACED);
spin_unlock_irq(&current->sighand->siglock);
+ preempt_disable();
read_lock(&tasklist_lock);
if (may_ptrace_stop()) {
do_notify_parent_cldstop(current, CLD_TRAPPED);
read_unlock(&tasklist_lock);
+ /*
+ * Don't want to allow preemption here, because
+ * sys_ptrace() needs this task to be inactive.
+ */
+ preempt_enable_no_resched();
schedule();
} else {
/*
@@ -1586,6 +1592,7 @@ static void ptrace_stop(int exit_code, i
if (clear_code)
current->exit_code = 0;
read_unlock(&tasklist_lock);
+ preempt_enable();
}

/*


2009-03-19 23:34:32

by Roland McGrath

[permalink] [raw]
Subject: Re: [patch] fix uml slowness caused by ptrace preemption bug on host

I'm no scheduler expert and I don't know whether the exact placement in
your change is the optimal one. But it's certainly fine from a ptrace
perspective.


Thanks,
Roland

2009-03-20 08:06:43

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch] fix uml slowness caused by ptrace preemption bug on host

On Thu, 19 Mar 2009, Roland McGrath wrote:
> I'm no scheduler expert and I don't know whether the exact placement in
> your change is the optimal one. But it's certainly fine from a ptrace
> perspective.

I'm no scheduler expert either.

Maybe a more generic solution in the scheduler (something like this
totally untested patch) would be better? What say you, scheduler
experts?

Thanks,
Miklos


Index: linux.git/kernel/sched.c
===================================================================
--- linux.git.orig/kernel/sched.c 2009-03-18 12:53:47.000000000 +0100
+++ linux.git/kernel/sched.c 2009-03-20 08:58:13.000000000 +0100
@@ -4629,7 +4629,8 @@ asmlinkage void __sched preempt_schedule
* If there is a non-zero preempt_count or interrupts are disabled,
* we do not want to preempt the current task. Just return..
*/
- if (likely(ti->preempt_count || irqs_disabled()))
+ if (likely(ti->preempt_count || irqs_disabled() ||
+ current->state != TASK_RUNNING))
return;

do {

2009-03-20 08:20:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch] fix uml slowness caused by ptrace preemption bug on host

On Thu, 2009-03-19 at 23:23 +0100, Miklos Szeredi wrote:
> From: Miklos Szeredi <[email protected]>
>
> This patch fixes bug #12208.
>
> This turned out to be not a scheduler regression, but an already
> existing problem in ptrace being triggered by subtle scheduler
> changes.
>
> The problem is this:
>
> - task A is ptracing task B
> - task B stops on a trace event
> - task A is woken up and preempts task B
> - task A calls ptrace on task B, which does ptrace_check_attach()
> - this calls wait_task_inactive(), which sees that task B is still on the runq
> - task A goes to sleep for a jiffy
> - ...
>
> Since UML does lots of the above sequences, those jiffies quickly add
> up to make it slow as hell.
>
> This patch solves this by not scheduling on preempt_enable() after
> ptrace_stop() has woken up the tracer.

Nice,.. however did you find this?

Ingo is looking at changing wait_task_inactive() to not be quite so
stupid. I'll let him respond with more details when he's done poking at
the code :-)

2009-03-20 08:28:25

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [patch] fix uml slowness caused by ptrace preemption bug on host

On Fri, 20 Mar 2009, Peter Zijlstra wrote:
> On Thu, 2009-03-19 at 23:23 +0100, Miklos Szeredi wrote:
> >
> > This patch solves this by not scheduling on preempt_enable() after
> > ptrace_stop() has woken up the tracer.
>
> Nice,.. however did you find this?

Ftrace helped a lot, it's a really cool tool :). I had to patch it
with this, otherwise the timestamps would be totally off:

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index bd38c5c..557c2dd 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -108,7 +108,7 @@ u64 ring_buffer_time_stamp(int cpu)

preempt_disable_notrace();
/* shift to debug/test normalization and TIME_EXTENTS */
- time = sched_clock() << DEBUG_SHIFT;
+ time = cpu_clock(cpu) << DEBUG_SHIFT;
preempt_enable_no_resched_notrace();

return time;

2009-03-20 08:32:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] fix uml slowness caused by ptrace preemption bug on host


* Miklos Szeredi <[email protected]> wrote:

> On Fri, 20 Mar 2009, Peter Zijlstra wrote:
> > On Thu, 2009-03-19 at 23:23 +0100, Miklos Szeredi wrote:
> > >
> > > This patch solves this by not scheduling on preempt_enable() after
> > > ptrace_stop() has woken up the tracer.
> >
> > Nice,.. however did you find this?
>
> Ftrace helped a lot, it's a really cool tool :). I had to patch it
> with this, otherwise the timestamps would be totally off:
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index bd38c5c..557c2dd 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -108,7 +108,7 @@ u64 ring_buffer_time_stamp(int cpu)
>
> preempt_disable_notrace();
> /* shift to debug/test normalization and TIME_EXTENTS */
> - time = sched_clock() << DEBUG_SHIFT;
> + time = cpu_clock(cpu) << DEBUG_SHIFT;
> preempt_enable_no_resched_notrace();

Btw., based on your earlier report, the same is now possible in the
latest tracing tree via:

echo 1 > /debug/tracing/options/global_clock

Ingo

2009-03-20 13:56:02

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] fix uml slowness caused by ptrace preemption bug on host

On 03/19, Roland McGrath wrote:
>
> I'm no scheduler expert and I don't know whether the exact placement in
> your change is the optimal one.

Agreed, can't we do a bit more simple patch?

--- kernel/signal.c
+++ kernel/signal.c
@@ -1572,8 +1572,10 @@ static void ptrace_stop(int exit_code, i
spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
if (may_ptrace_stop()) {
+ preempt_disable();
do_notify_parent_cldstop(current, CLD_TRAPPED);
read_unlock(&tasklist_lock);
+ preempt_enable_no_resched();
schedule();
} else {
/*

Yes, the task can be preempted right after spin_unlock(->siglock), but
this is unlikely. We need the "synchronous" wakeup, and this patch helps
as well.



Actually, I don't know which ptrace requests really need to make sure
the tracee was deactivated. Perhaps they can call wait_task_inactive()
themselves? I guess this is bad idea, but most of requests definitely
do not need wait_task_inactive().

Oleg.

2009-03-20 14:08:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] fix uml slowness caused by ptrace preemption bug on host

On 03/20, Miklos Szeredi wrote:
>
> I'm no scheduler expert either.

neither me ;)

> --- linux.git.orig/kernel/sched.c 2009-03-18 12:53:47.000000000 +0100
> +++ linux.git/kernel/sched.c 2009-03-20 08:58:13.000000000 +0100
> @@ -4629,7 +4629,8 @@ asmlinkage void __sched preempt_schedule
> * If there is a non-zero preempt_count or interrupts are disabled,
> * we do not want to preempt the current task. Just return..
> */
> - if (likely(ti->preempt_count || irqs_disabled()))
> + if (likely(ti->preempt_count || irqs_disabled() ||
> + current->state != TASK_RUNNING))

But this was specially designed to allow to preempt !TASK_RUNNING tasks,
note the "if (prev->state && !(preempt_count() & PREEMPT_ACTIVE))" in
schedule().

Perhaps "|| current->state == TASK_TRACED" makes more sense, TASK_TRACED
is special because we know we are going to schedule really soon. But I think
your previous patch is better, imho we should change preempt_schedule() to
fix the very specific problem with ptrace_notify().

Oleg.