2009-03-31 13:34:40

by Metzger, Markus T

[permalink] [raw]
Subject: [patch 3/21] x86, bts: wait until traced task has been scheduled out

In order to stop branch tracing for a running task, we need to first
clear the branch tracing control bits before we may free the tracing
buffer.
If the traced task is running, the cpu might still trace that task
after the branch trace control bits have cleared.

Wait until the traced task has been scheduled out before proceeding.


A similar problem affects the task debug store context. We first remove
the context, then we need to wait until the task has been scheduled
out before we can free the context memory.


Signed-off-by: Markus Metzger <[email protected]>
---

Index: git-tip/arch/x86/kernel/ds.c
===================================================================
--- git-tip.orig/arch/x86/kernel/ds.c 2009-03-30 17:19:14.000000000 +0200
+++ git-tip/arch/x86/kernel/ds.c 2009-03-30 17:20:11.000000000 +0200
@@ -250,6 +250,42 @@ static DEFINE_PER_CPU(struct ds_context
#define system_context per_cpu(system_context_array, smp_processor_id())


+/*
+ * Wait for the traced task to unschedule.
+ *
+ * This guarantees that the bts trace configuration has been
+ * synchronized with the cpu executing the task.
+ */
+static void wait_to_unschedule(struct task_struct *task)
+{
+ unsigned long nvcsw;
+ unsigned long nivcsw;
+
+ if (!task)
+ return;
+
+ if (task == current)
+ return;
+
+ nvcsw = task->nvcsw;
+ nivcsw = task->nivcsw;
+ for (;;) {
+ if (!task_is_running(task))
+ break;
+ /*
+ * The switch count is incremented before the actual
+ * context switch. We thus wait for two switches to be
+ * sure at least one completed.
+ */
+ if ((task->nvcsw - nvcsw) > 1)
+ break;
+ if ((task->nivcsw - nivcsw) > 1)
+ break;
+
+ schedule();
+ }
+}
+
static inline struct ds_context *ds_get_context(struct task_struct *task)
{
struct ds_context **p_context =
@@ -321,6 +357,9 @@ static inline void ds_put_context(struct

spin_unlock_irqrestore(&ds_lock, irq);

+ /* The context might still be in use for context switching. */
+ wait_to_unschedule(context->task);
+
kfree(context);
}

@@ -789,6 +828,9 @@ void ds_release_bts(struct bts_tracer *t
WARN_ON_ONCE(tracer->ds.context->bts_master != tracer);
tracer->ds.context->bts_master = NULL;

+ /* Make sure tracing stopped and the tracer is not in use. */
+ wait_to_unschedule(tracer->ds.context->task);
+
put_tracer(tracer->ds.context->task);
ds_put_context(tracer->ds.context);

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


2009-04-01 00:22:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out

On 03/31, Markus Metzger wrote:
>
> +static void wait_to_unschedule(struct task_struct *task)
> +{
> + unsigned long nvcsw;
> + unsigned long nivcsw;
> +
> + if (!task)
> + return;
> +
> + if (task == current)
> + return;
> +
> + nvcsw = task->nvcsw;
> + nivcsw = task->nivcsw;
> + for (;;) {
> + if (!task_is_running(task))
> + break;
> + /*
> + * The switch count is incremented before the actual
> + * context switch. We thus wait for two switches to be
> + * sure at least one completed.
> + */
> + if ((task->nvcsw - nvcsw) > 1)
> + break;
> + if ((task->nivcsw - nivcsw) > 1)
> + break;
> +
> + schedule();

schedule() is a nop here. We can wait unpredictably long...

Ingo, do have have any ideas to improve this helper?

Not that I really like it, but how about

int force_unschedule(struct task_struct *p)
{
struct rq *rq;
unsigned long flags;
int running;

rq = task_rq_lock(p, &flags);
running = task_running(rq, p);
task_rq_unlock(rq, &flags);

if (running)
wake_up_process(rq->migration_thread);

return running;
}

which should be used instead of task_is_running() ?


We can even do something like

void wait_to_unschedule(struct task_struct *task)
{
struct migration_req req;

rq = task_rq_lock(p, &task);
running = task_running(rq, p);
if (running) {
// make sure __migrate_task() will do nothing
req->dest_cpu = NR_CPUS + 1;
init_completion(&req->done);
list_add(&req->list, &rq->migration_queue);
}
task_rq_unlock(rq, &flags);

if (running) {
wake_up_process(rq->migration_thread);
wait_for_completion(&req.done);
}
}

This way we don't poll, and we need only one helper.

(Can't resist, this patch is not bisect friendly, without the next patches
wait_to_unschedule() is called under write_lock_irq, this is deadlockable).

But anyway, I think we can do this later.

Oleg.

2009-04-01 00:30:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out

Sorry for noise, forgot to mention...

On 03/31, Markus Metzger wrote:
>
> static inline struct ds_context *ds_get_context(struct task_struct *task)

Completely off-topic, but ds_get_context() is rather fat, imho makes
sense to uninline.

Oleg.

2009-04-01 08:10:45

by Metzger, Markus T

[permalink] [raw]
Subject: RE: [patch 3/21] x86, bts: wait until traced task has been scheduled out

>-----Original Message-----
>From: Oleg Nesterov [mailto:[email protected]]
>Sent: Wednesday, April 01, 2009 2:17 AM
>To: Metzger, Markus T


>> +static void wait_to_unschedule(struct task_struct *task)
>> +{
>> + unsigned long nvcsw;
>> + unsigned long nivcsw;
>> +
>> + if (!task)
>> + return;
>> +
>> + if (task == current)
>> + return;
>> +
>> + nvcsw = task->nvcsw;
>> + nivcsw = task->nivcsw;
>> + for (;;) {
>> + if (!task_is_running(task))
>> + break;
>> + /*
>> + * The switch count is incremented before the actual
>> + * context switch. We thus wait for two switches to be
>> + * sure at least one completed.
>> + */
>> + if ((task->nvcsw - nvcsw) > 1)
>> + break;
>> + if ((task->nivcsw - nivcsw) > 1)
>> + break;
>> +
>> + schedule();
>
>schedule() is a nop here. We can wait unpredictably long...

Hmmm, As far as I understand the code, rt-workqueues use a higher sched_class
and can thus not be preempted by normal threads. Non-rt workqueues
use the fair_sched_class. And schedule_work() uses a non-rt workqueue.

In practice, task is ptraced. It is either stopped or exiting.
I don't expect to loop very often.


>
>Ingo, do have have any ideas to improve this helper?
>
>Not that I really like it, but how about
>
> int force_unschedule(struct task_struct *p)
> {
> struct rq *rq;
> unsigned long flags;
> int running;
>
> rq = task_rq_lock(p, &flags);
> running = task_running(rq, p);
> task_rq_unlock(rq, &flags);
>
> if (running)
> wake_up_process(rq->migration_thread);
>
> return running;
> }
>
>which should be used instead of task_is_running() ?
>
>
>We can even do something like
>
> void wait_to_unschedule(struct task_struct *task)
> {
> struct migration_req req;
>
> rq = task_rq_lock(p, &task);
> running = task_running(rq, p);
> if (running) {
> // make sure __migrate_task() will do nothing
> req->dest_cpu = NR_CPUS + 1;
> init_completion(&req->done);
> list_add(&req->list, &rq->migration_queue);
> }
> task_rq_unlock(rq, &flags);
>
> if (running) {
> wake_up_process(rq->migration_thread);
> wait_for_completion(&req.done);
> }
> }
>
>This way we don't poll, and we need only one helper.
>
>(Can't resist, this patch is not bisect friendly, without the next patches
> wait_to_unschedule() is called under write_lock_irq, this is deadlockable).

I know. See the reply to patch 0; I tried to keep the patches small and focused
to simplify the review work and attract reviewers.

thanks and regards,
markus.

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2009-04-01 11:42:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out


* Oleg Nesterov <[email protected]> wrote:

> On 03/31, Markus Metzger wrote:
> >
> > +static void wait_to_unschedule(struct task_struct *task)
> > +{
> > + unsigned long nvcsw;
> > + unsigned long nivcsw;
> > +
> > + if (!task)
> > + return;
> > +
> > + if (task == current)
> > + return;
> > +
> > + nvcsw = task->nvcsw;
> > + nivcsw = task->nivcsw;
> > + for (;;) {
> > + if (!task_is_running(task))
> > + break;
> > + /*
> > + * The switch count is incremented before the actual
> > + * context switch. We thus wait for two switches to be
> > + * sure at least one completed.
> > + */
> > + if ((task->nvcsw - nvcsw) > 1)
> > + break;
> > + if ((task->nivcsw - nivcsw) > 1)
> > + break;
> > +
> > + schedule();
>
> schedule() is a nop here. We can wait unpredictably long...
>
> Ingo, do have have any ideas to improve this helper?

hm, there's a similar looking existing facility:
wait_task_inactive(). Have i missed some subtle detail that makes it
inappropriate for use here?

> Not that I really like it, but how about
>
> int force_unschedule(struct task_struct *p)
> {
> struct rq *rq;
> unsigned long flags;
> int running;
>
> rq = task_rq_lock(p, &flags);
> running = task_running(rq, p);
> task_rq_unlock(rq, &flags);
>
> if (running)
> wake_up_process(rq->migration_thread);
>
> return running;
> }
>
> which should be used instead of task_is_running() ?

Yes - wait_task_inactive() should be switched to a scheme like that
- it would fix bugs like:

53da1d9: fix ptrace slowness

in a cleaner way.

> We can even do something like
>
> void wait_to_unschedule(struct task_struct *task)
> {
> struct migration_req req;
>
> rq = task_rq_lock(p, &task);
> running = task_running(rq, p);
> if (running) {
> // make sure __migrate_task() will do nothing
> req->dest_cpu = NR_CPUS + 1;
> init_completion(&req->done);
> list_add(&req->list, &rq->migration_queue);
> }
> task_rq_unlock(rq, &flags);
>
> if (running) {
> wake_up_process(rq->migration_thread);
> wait_for_completion(&req.done);
> }
> }
>
> This way we don't poll, and we need only one helper.

Looks even better. The migration thread would run complete(), right?

A detail: i suspect this needs to be in a while() loop, for the case
that the victim task raced with us and went to another CPU before we
kicked it off via the migration thread.

This looks very useful to me. It could also be tested easily: revert
53da1d9 and you should see:

time strace dd if=/dev/zero of=/dev/null bs=1024 count=1000000

performance plummet on an SMP box. The with your fix it should go up
to near full speed again.

Ingo

2009-04-01 12:44:32

by Metzger, Markus T

[permalink] [raw]
Subject: RE: [patch 3/21] x86, bts: wait until traced task has been scheduled out

>-----Original Message-----
>From: Ingo Molnar [mailto:[email protected]]
>Sent: Wednesday, April 01, 2009 1:42 PM
>To: Oleg Nesterov; Peter Zijlstra


>* Oleg Nesterov <[email protected]> wrote:
>
>> On 03/31, Markus Metzger wrote:
>> >
>> > +static void wait_to_unschedule(struct task_struct *task)
>> > +{
>> > + unsigned long nvcsw;
>> > + unsigned long nivcsw;
>> > +
>> > + if (!task)
>> > + return;
>> > +
>> > + if (task == current)
>> > + return;
>> > +
>> > + nvcsw = task->nvcsw;
>> > + nivcsw = task->nivcsw;
>> > + for (;;) {
>> > + if (!task_is_running(task))
>> > + break;
>> > + /*
>> > + * The switch count is incremented before the actual
>> > + * context switch. We thus wait for two switches to be
>> > + * sure at least one completed.
>> > + */
>> > + if ((task->nvcsw - nvcsw) > 1)
>> > + break;
>> > + if ((task->nivcsw - nivcsw) > 1)
>> > + break;
>> > +
>> > + schedule();
>>
>> schedule() is a nop here. We can wait unpredictably long...
>>
>> Ingo, do have have any ideas to improve this helper?
>
>hm, there's a similar looking existing facility:
>wait_task_inactive(). Have i missed some subtle detail that makes it
>inappropriate for use here?


wait_task_inactive() waits until the task is no longer TASK_RUNNING.

I need to wait until the task has been scheduled out at least once.


regards,
markus.

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2009-04-01 12:53:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out


* Metzger, Markus T <[email protected]> wrote:

> >-----Original Message-----
> >From: Ingo Molnar [mailto:[email protected]]
> >Sent: Wednesday, April 01, 2009 1:42 PM
> >To: Oleg Nesterov; Peter Zijlstra
>
>
> >* Oleg Nesterov <[email protected]> wrote:
> >
> >> On 03/31, Markus Metzger wrote:
> >> >
> >> > +static void wait_to_unschedule(struct task_struct *task)
> >> > +{
> >> > + unsigned long nvcsw;
> >> > + unsigned long nivcsw;
> >> > +
> >> > + if (!task)
> >> > + return;
> >> > +
> >> > + if (task == current)
> >> > + return;
> >> > +
> >> > + nvcsw = task->nvcsw;
> >> > + nivcsw = task->nivcsw;
> >> > + for (;;) {
> >> > + if (!task_is_running(task))
> >> > + break;
> >> > + /*
> >> > + * The switch count is incremented before the actual
> >> > + * context switch. We thus wait for two switches to be
> >> > + * sure at least one completed.
> >> > + */
> >> > + if ((task->nvcsw - nvcsw) > 1)
> >> > + break;
> >> > + if ((task->nivcsw - nivcsw) > 1)
> >> > + break;
> >> > +
> >> > + schedule();
> >>
> >> schedule() is a nop here. We can wait unpredictably long...
> >>
> >> Ingo, do have have any ideas to improve this helper?
> >
> >hm, there's a similar looking existing facility:
> >wait_task_inactive(). Have i missed some subtle detail that makes it
> >inappropriate for use here?
>
> wait_task_inactive() waits until the task is no longer
> TASK_RUNNING.

No, that's wrong, wait_task_inactive() waits until the task
deschedules.

Ingo

2009-04-01 19:08:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out

On 04/01, Metzger, Markus T wrote:
>
> >-----Original Message-----
> >From: Oleg Nesterov [mailto:[email protected]]
> >Sent: Wednesday, April 01, 2009 2:17 AM
> >To: Metzger, Markus T
>
> >> +static void wait_to_unschedule(struct task_struct *task)
> >> +{
> >> + unsigned long nvcsw;
> >> + unsigned long nivcsw;
> >> +
> >> + if (!task)
> >> + return;
> >> +
> >> + if (task == current)
> >> + return;
> >> +
> >> + nvcsw = task->nvcsw;
> >> + nivcsw = task->nivcsw;
> >> + for (;;) {
> >> + if (!task_is_running(task))
> >> + break;
> >> + /*
> >> + * The switch count is incremented before the actual
> >> + * context switch. We thus wait for two switches to be
> >> + * sure at least one completed.
> >> + */
> >> + if ((task->nvcsw - nvcsw) > 1)
> >> + break;
> >> + if ((task->nivcsw - nivcsw) > 1)
> >> + break;
> >> +
> >> + schedule();
> >
> >schedule() is a nop here. We can wait unpredictably long...
>
> Hmmm, As far as I understand the code, rt-workqueues use a higher sched_class
> and can thus not be preempted by normal threads. Non-rt workqueues
> use the fair_sched_class. And schedule_work() uses a non-rt workqueue.

I was unclear, sorry.

I meant, in this case

while (!CONDITION)
schedule();

is not better compared to

while (!CONDITION)
; /* do nothing */

(OK, schedule() is better without CONFIG_PREEMPT, but this doesn't matter).
wait_to_unschedule() just spins waiting for ->nXvcsw, this is not optimal.

And another problem, we can wait unpredictably long, because

> In practice, task is ptraced. It is either stopped or exiting.
> I don't expect to loop very often.

No. The task _was_ ptraced when we called (say) ptrace_detach(). But when
work->func() runs, the tracee is not traced, it is running (not necessary
of course, the tracer _can_ leave it in TASK_STOPPED).

Now, again, suppose that this task does "for (;;) ;" in user-space.
If CPU is "free", it can spin "forever" without re-scheduling. Yes sure,
this case is not likely in practice, but still.

Oleg.

2009-04-01 19:49:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out

On 04/01, Ingo Molnar wrote:
>
> * Oleg Nesterov <[email protected]> wrote:
>
> > On 03/31, Markus Metzger wrote:
> > >
> > > +static void wait_to_unschedule(struct task_struct *task)
> > > +{
> > > + unsigned long nvcsw;
> > > + unsigned long nivcsw;
> > > +
> > > + if (!task)
> > > + return;
> > > +
> > > + if (task == current)
> > > + return;
> > > +
> > > + nvcsw = task->nvcsw;
> > > + nivcsw = task->nivcsw;
> > > + for (;;) {
> > > + if (!task_is_running(task))
> > > + break;
> > > + /*
> > > + * The switch count is incremented before the actual
> > > + * context switch. We thus wait for two switches to be
> > > + * sure at least one completed.
> > > + */
> > > + if ((task->nvcsw - nvcsw) > 1)
> > > + break;
> > > + if ((task->nivcsw - nivcsw) > 1)
> > > + break;
> > > +
> > > + schedule();
> >
> > schedule() is a nop here. We can wait unpredictably long...
> >
> > Ingo, do have have any ideas to improve this helper?
>
> hm, there's a similar looking existing facility:
> wait_task_inactive(). Have i missed some subtle detail that makes it
> inappropriate for use here?

Yes, there are similar, but still different.

wait_to_unschedule(task) waits until this task does context switch at
least once. It is fine if this task runs again when wait_to_unschedule()
returns. (if !task_is_running(task), it already did context switch).

wait_task_inactive() ensures that this task is deactivated. It can't be
used here, because it can "never" be deactivated.

> > int force_unschedule(struct task_struct *p)
> > {
> > struct rq *rq;
> > unsigned long flags;
> > int running;
> >
> > rq = task_rq_lock(p, &flags);
> > running = task_running(rq, p);
> > task_rq_unlock(rq, &flags);
> >
> > if (running)
> > wake_up_process(rq->migration_thread);
> >
> > return running;
> > }
> >
> > which should be used instead of task_is_running() ?
>
> Yes - wait_task_inactive() should be switched to a scheme like that

Yes, I thought about this, perhaps we can improve wait_task_inactive()
a bit. Unfortunately, this is not enough to kill schedule_timeout(1).

> - it would fix bugs like:
>
> 53da1d9: fix ptrace slowness

I don't think so. Quite contrary, the problem with "fix ptrace slowness"
is that we do not want the TASK_TRACED task to be preempted before it
does the voluntary schedule() (without PREEMPT_ACTIVE).

> > void wait_to_unschedule(struct task_struct *task)
> > {
> > struct migration_req req;
> >
> > rq = task_rq_lock(p, &task);
> > running = task_running(rq, p);
> > if (running) {
> > // make sure __migrate_task() will do nothing
> > req->dest_cpu = NR_CPUS + 1;
> > init_completion(&req->done);
> > list_add(&req->list, &rq->migration_queue);
> > }
> > task_rq_unlock(rq, &flags);
> >
> > if (running) {
> > wake_up_process(rq->migration_thread);
> > wait_for_completion(&req.done);
> > }
> > }
> >
> > This way we don't poll, and we need only one helper.
>
> Looks even better. The migration thread would run complete(), right?

Yes,

> A detail: i suspect this needs to be in a while() loop, for the case
> that the victim task raced with us and went to another CPU before we
> kicked it off via the migration thread.

I think this doesn't matter. If the task is not running - we don't
care and do nothing. If it is running and migrates - it should do
a context switch at least once.

But the code above is not right wrt cpu hotplug. wake_up_process()
can hit the NULL rq->migration_thread if we race with CPU_DEAD.

Hmm. don't we have this problem in, say, set_cpus_allowed_ptr() ?
Unless it is called without get_online_cpus(), ->migration_thread
can go away once we drop rq->lock.

Perhaps, we need something like this

--- kernel/sched.c
+++ kernel/sched.c
@@ -6132,8 +6132,10 @@ int set_cpus_allowed_ptr(struct task_str

if (migrate_task(p, cpumask_any_and(cpu_online_mask, new_mask), &req)) {
/* Need help from migration thread: drop lock and wait. */
+ preempt_disable();
task_rq_unlock(rq, &flags);
wake_up_process(rq->migration_thread);
+ preempt_enable();
wait_for_completion(&req.done);
tlb_migrate_finish(p->mm);
return 0;

?

Oleg.

2009-04-01 19:53:17

by Markus Metzger

[permalink] [raw]
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out

On Wed, 2009-04-01 at 21:04 +0200, Oleg Nesterov wrote:
> On 04/01, Metzger, Markus T wrote:
> >
> > >-----Original Message-----
> > >From: Oleg Nesterov [mailto:[email protected]]
> > >Sent: Wednesday, April 01, 2009 2:17 AM
> > >To: Metzger, Markus T
> >
> > >> +static void wait_to_unschedule(struct task_struct *task)
> > >> +{
> > >> + unsigned long nvcsw;
> > >> + unsigned long nivcsw;
> > >> +
> > >> + if (!task)
> > >> + return;
> > >> +
> > >> + if (task == current)
> > >> + return;
> > >> +
> > >> + nvcsw = task->nvcsw;
> > >> + nivcsw = task->nivcsw;
> > >> + for (;;) {
> > >> + if (!task_is_running(task))
> > >> + break;
> > >> + /*
> > >> + * The switch count is incremented before the actual
> > >> + * context switch. We thus wait for two switches to be
> > >> + * sure at least one completed.
> > >> + */
> > >> + if ((task->nvcsw - nvcsw) > 1)
> > >> + break;
> > >> + if ((task->nivcsw - nivcsw) > 1)
> > >> + break;
> > >> +
> > >> + schedule();
> > >
> > >schedule() is a nop here. We can wait unpredictably long...
> >
> > Hmmm, As far as I understand the code, rt-workqueues use a higher sched_class
> > and can thus not be preempted by normal threads. Non-rt workqueues
> > use the fair_sched_class. And schedule_work() uses a non-rt workqueue.
>
> I was unclear, sorry.
>
> I meant, in this case
>
> while (!CONDITION)
> schedule();
>
> is not better compared to
>
> while (!CONDITION)
> ; /* do nothing */
>
> (OK, schedule() is better without CONFIG_PREEMPT, but this doesn't matter).
> wait_to_unschedule() just spins waiting for ->nXvcsw, this is not optimal.
>
> And another problem, we can wait unpredictably long, because
>
> > In practice, task is ptraced. It is either stopped or exiting.
> > I don't expect to loop very often.
>
> No. The task _was_ ptraced when we called (say) ptrace_detach(). But when
> work->func() runs, the tracee is not traced, it is running (not necessary
> of course, the tracer _can_ leave it in TASK_STOPPED).
>
> Now, again, suppose that this task does "for (;;) ;" in user-space.
> If CPU is "free", it can spin "forever" without re-scheduling. Yes sure,
> this case is not likely in practice, but still.

So I should rather not call schedule()?

I thought it's better to yield the cpu than to spin.


I will resend a bisect-friendly version of the series (using quilt mail,
this time) tomorrow.

I will remove schedule() in the wait_to_unschedule() loop and also
address the minor nitpicks you mentioned in your other reviews.

thanks,
markus.