2005-05-31 07:28:27

by Shaohua Li

[permalink] [raw]
Subject: [PATCH]CPU hotplug breaks wake_up_new_task

Hi,
There is a race condition at wake_up_new_task at CPU hotplug case.
Say do_fork
copy_process (which sets new forked task's current cpu, cpu_allowed)
<-------- the new forked task's current cpu is offline
wake_up_new_task
wake_up_new_task will put the forked task into a dead cpu.

Thanks,
Shaohua

Signed-off-by: Shaohua Li<[email protected]>
---

linux-2.6.11-rc5-mm1-root/kernel/sched.c | 25 +++++++++++++++++++++++--
1 files changed, 23 insertions(+), 2 deletions(-)

diff -puN kernel/sched.c~wake_up_new_task_to_online_cpu kernel/sched.c
--- linux-2.6.11-rc5-mm1/kernel/sched.c~wake_up_new_task_to_online_cpu 2005-05-31 13:39:43.888682784 +0800
+++ linux-2.6.11-rc5-mm1-root/kernel/sched.c 2005-05-31 14:49:58.537958704 +0800
@@ -1412,6 +1412,10 @@ void fastcall sched_fork(task_t *p, int
put_cpu();
}

+#ifdef CONFIG_HOTPLUG_CPU
+static int task_select_online_cpu(int dead_cpu, struct task_struct *tsk);
+#endif
+
/*
* wake_up_new_task - wake up a newly created task for the first time.
*
@@ -1426,9 +1430,20 @@ void fastcall wake_up_new_task(task_t *
runqueue_t *rq, *this_rq;

rq = task_rq_lock(p, &flags);
- BUG_ON(p->state != TASK_RUNNING);
this_cpu = smp_processor_id();
cpu = task_cpu(p);
+#ifdef CONFIG_HOTPLUG_CPU
+ while (!cpu_online(cpu)) {
+ cpu = task_select_online_cpu(cpu, p);
+ set_task_cpu(p, cpu);
+ task_rq_unlock(rq, &flags);
+ /* CPU hotplug might occur here */
+ rq = task_rq_lock(p, &flags);
+ this_cpu = smp_processor_id();
+ cpu = task_cpu(p);
+ }
+#endif
+ BUG_ON(p->state != TASK_RUNNING);

/*
* We decrease the sleep average of forking parents
@@ -4457,7 +4472,7 @@ wait_to_die:

#ifdef CONFIG_HOTPLUG_CPU
/* Figure out where task on dead CPU should go, use force if neccessary. */
-static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *tsk)
+static int task_select_online_cpu(int dead_cpu, struct task_struct *tsk)
{
int dest_cpu;
cpumask_t mask;
@@ -4486,6 +4501,12 @@ static void move_task_off_dead_cpu(int d
"longer affine to cpu%d\n",
tsk->pid, tsk->comm, dead_cpu);
}
+ return dest_cpu;
+}
+
+static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *tsk)
+{
+ int dest_cpu = task_select_online_cpu(dead_cpu, tsk);
__migrate_task(tsk, dead_cpu, dest_cpu);
}

_



2005-05-31 08:00:52

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

On Tue, May 31, 2005 at 12:35:09AM -0700, Shaohua Li wrote:
>
> Hi,
> There is a race condition at wake_up_new_task at CPU hotplug case.
> Say do_fork
> copy_process (which sets new forked task's current cpu,
> cpu_allowed)
> <-------- the new forked task's current cpu is offline
> wake_up_new_task
> wake_up_new_task will put the forked task into a dead cpu.
>
> Thanks,
> Shaohua
>
> Signed-off-by: Shaohua Li<[email protected]>
> ---
>
> linux-2.6.11-rc5-mm1-root/kernel/sched.c | 25
.... Deleted....
> /*
> * wake_up_new_task - wake up a newly created task for the first
> time.
> *
> @@ -1426,9 +1430,20 @@ void fastcall wake_up_new_task(task_t *
> runqueue_t *rq, *this_rq;
>
> rq = task_rq_lock(p, &flags);
> - BUG_ON(p->state != TASK_RUNNING);
> this_cpu = smp_processor_id();
> cpu = task_cpu(p);
> +#ifdef CONFIG_HOTPLUG_CPU
> + while (!cpu_online(cpu)) {
> + cpu = task_select_online_cpu(cpu, p);
> + set_task_cpu(p, cpu);
> + task_rq_unlock(rq, &flags);
> + /* CPU hotplug might occur here */
> + rq = task_rq_lock(p, &flags);
> + this_cpu = smp_processor_id();
> + cpu = task_cpu(p);
> + }
> +#endif

The while() loop doesnt look pretty here.. could you try to
disable preempt, and see the problem goes away? or use
get_cpu()/put_cpu() combo when you get this_cpu?

Just wondering if the code would be a little more simpler in this
case.

Nick should have something to say ...

Cheers,
ashok

2005-05-31 08:28:17

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

On Tue, 2005-05-31 at 01:00 -0700, Ashok Raj wrote:
> On Tue, May 31, 2005 at 12:35:09AM -0700, Shaohua Li wrote:
> >
> > Hi,
> > There is a race condition at wake_up_new_task at CPU hotplug case.
> > Say do_fork
> > copy_process (which sets new forked task's current cpu,
> > cpu_allowed)
> > <-------- the new forked task's current cpu is offline
> > wake_up_new_task
> > wake_up_new_task will put the forked task into a dead cpu.
> >
>
> The while() loop doesnt look pretty here.. could you try to
> disable preempt, and see the problem goes away? or use
> get_cpu()/put_cpu() combo when you get this_cpu?
>
> Just wondering if the code would be a little more simpler in this
> case.
I must be over considering. Ok, how does this updated one look?


Signed-off-by: Shaohua Li<[email protected]>
---

linux-2.6.11-rc5-mm1-root/kernel/sched.c | 23 +++++++++++++++++++++--
1 files changed, 21 insertions(+), 2 deletions(-)

diff -puN kernel/sched.c~wake_up_new_task_to_online_cpu kernel/sched.c
--- linux-2.6.11-rc5-mm1/kernel/sched.c~wake_up_new_task_to_online_cpu 2005-05-31 13:39:43.888682784 +0800
+++ linux-2.6.11-rc5-mm1-root/kernel/sched.c 2005-05-31 16:14:37.390855704 +0800
@@ -1412,6 +1412,10 @@ void fastcall sched_fork(task_t *p, int
put_cpu();
}

+#ifdef CONFIG_HOTPLUG_CPU
+static int task_select_online_cpu(int dead_cpu, struct task_struct *tsk);
+#endif
+
/*
* wake_up_new_task - wake up a newly created task for the first time.
*
@@ -1425,10 +1429,18 @@ void fastcall wake_up_new_task(task_t *
int this_cpu, cpu;
runqueue_t *rq, *this_rq;

+ this_cpu = get_cpu();
rq = task_rq_lock(p, &flags);
BUG_ON(p->state != TASK_RUNNING);
- this_cpu = smp_processor_id();
cpu = task_cpu(p);
+#ifdef CONFIG_HOTPLUG_CPU
+ if (!cpu_online(cpu)) {
+ cpu = task_select_online_cpu(cpu, p);
+ set_task_cpu(p, cpu);
+ task_rq_unlock(rq, &flags);
+ rq = task_rq_lock(p, &flags);
+ }
+#endif

/*
* We decrease the sleep average of forking parents
@@ -1491,6 +1503,7 @@ void fastcall wake_up_new_task(task_t *
current->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(current) *
PARENT_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
task_rq_unlock(this_rq, &flags);
+ put_cpu();
}

/*
@@ -4457,7 +4470,7 @@ wait_to_die:

#ifdef CONFIG_HOTPLUG_CPU
/* Figure out where task on dead CPU should go, use force if neccessary. */
-static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *tsk)
+static int task_select_online_cpu(int dead_cpu, struct task_struct *tsk)
{
int dest_cpu;
cpumask_t mask;
@@ -4486,6 +4499,12 @@ static void move_task_off_dead_cpu(int d
"longer affine to cpu%d\n",
tsk->pid, tsk->comm, dead_cpu);
}
+ return dest_cpu;
+}
+
+static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *tsk)
+{
+ int dest_cpu = task_select_online_cpu(dead_cpu, tsk);
__migrate_task(tsk, dead_cpu, dest_cpu);
}

_


2005-05-31 08:50:37

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

Shaohua Li wrote:

> I must be over considering. Ok, how does this updated one look?
>
>

Looks like you've found a race, alright. Nice work!

I think it would be preferable to do the check in kernel/fork.c,
after the tasklist lock is taken (and you'll need to rediff the
patch for the -mm tree).

One problem with doing it in wake_up_new_task is that I think
some newly forked tasks don't get woken up by wake_up_new_task!

Thanks,
Nick
Send instant messages to your online friends http://au.messenger.yahoo.com

2005-05-31 09:04:09

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

On Tue, 2005-05-31 at 18:50 +1000, Nick Piggin wrote:
> Shaohua Li wrote:
>
> > I must be over considering. Ok, how does this updated one look?
> >
> >
>
> Looks like you've found a race, alright. Nice work!
>
> I think it would be preferable to do the check in kernel/fork.c,
> after the tasklist lock is taken (and you'll need to rediff the
> patch for the -mm tree).
It seems there is still a race between copy_process and wake_up_new_task
to me (cpu offline after copy_process). Am I missing anything?

Thanks,
Shaohua

2005-05-31 09:09:12

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

Shaohua Li wrote:
> On Tue, 2005-05-31 at 18:50 +1000, Nick Piggin wrote:
>
>>Shaohua Li wrote:
>>
>>
>>>I must be over considering. Ok, how does this updated one look?
>>>
>>>
>>
>>Looks like you've found a race, alright. Nice work!
>>
>>I think it would be preferable to do the check in kernel/fork.c,
>>after the tasklist lock is taken (and you'll need to rediff the
>>patch for the -mm tree).
>
> It seems there is still a race between copy_process and wake_up_new_task
> to me (cpu offline after copy_process). Am I missing anything?
>

Offlining the CPU takes the tasklist lock to migrate off tasks.
So either the CPU will be offline first, in which case the a
check in kernel/fork.c will pick that up; or the task will be
added to the tasklist first, in which case CPU hotplug should
correctly migrate it away.

I think?

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-05-31 09:40:50

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

On Tue, May 31, 2005 at 07:29:39AM +0000, Shaohua Li wrote:
> Hi,
> There is a race condition at wake_up_new_task at CPU hotplug case.
> Say do_fork
> copy_process (which sets new forked task's current cpu, cpu_allowed)
> <-------- the new forked task's current cpu is offline
> wake_up_new_task
> wake_up_new_task will put the forked task into a dead cpu.

This was noticed/fixed long back. Apparently somebody has reintroduced
the bug. The simple fix for this race is:


--- kernel/fork.c.org 2005-05-31 14:57:15.000000000 +0530
+++ kernel/fork.c 2005-05-31 15:07:20.000000000 +0530
@@ -1024,8 +1024,7 @@ static task_t *copy_process(unsigned lon
* parent's CPU). This avoids alot of nasty races.
*/
p->cpus_allowed = current->cpus_allowed;
- if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed)))
- set_task_cpu(p, smp_processor_id());
+ set_task_cpu(p, smp_processor_id());

/*
* Check for pending SIGKILL! The new thread should not be allowed

Could you test and check if it avoids whatever problem you are seeing?


--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-05-31 09:46:23

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

Srivatsa Vaddagiri wrote:
> On Tue, May 31, 2005 at 07:29:39AM +0000, Shaohua Li wrote:
>
>>Hi,
>>There is a race condition at wake_up_new_task at CPU hotplug case.
>>Say do_fork
>> copy_process (which sets new forked task's current cpu, cpu_allowed)
>> <-------- the new forked task's current cpu is offline
>> wake_up_new_task
>>wake_up_new_task will put the forked task into a dead cpu.
>
>
> This was noticed/fixed long back. Apparently somebody has reintroduced
> the bug. The simple fix for this race is:
>

If you're looking at the -mm tree, then I think I must
have reintroduced the bug. It needs a comment.

>
> --- kernel/fork.c.org 2005-05-31 14:57:15.000000000 +0530
> +++ kernel/fork.c 2005-05-31 15:07:20.000000000 +0530
> @@ -1024,8 +1024,7 @@ static task_t *copy_process(unsigned lon
> * parent's CPU). This avoids alot of nasty races.
> */
> p->cpus_allowed = current->cpus_allowed;
> - if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed)))
> - set_task_cpu(p, smp_processor_id());
> + set_task_cpu(p, smp_processor_id());
>
> /*
> * Check for pending SIGKILL! The new thread should not be allowed
>
> Could you test and check if it avoids whatever problem you are seeing?
>
>

And this patch will break balance-on-fork.
How about conditionally setting task_cpu if the task's current
CPU is offline?

Thanks,
Nick

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2005-05-31 10:41:14

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

On Tue, May 31, 2005 at 07:46:13PM +1000, Nick Piggin wrote:
> And this patch will break balance-on-fork.

Right :-)

> How about conditionally setting task_cpu if the task's current
> CPU is offline?

Something like this?

--- kernel/fork.c.org 2005-05-31 14:57:15.000000000 +0530
+++ kernel/fork.c 2005-05-31 16:06:41.000000000 +0530
@@ -1024,7 +1024,8 @@ static task_t *copy_process(unsigned lon
* parent's CPU). This avoids alot of nasty races.
*/
p->cpus_allowed = current->cpus_allowed;
- if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed)))
+ if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed) ||
+ !cpu_online(task_cpu(p))))
set_task_cpu(p, smp_processor_id());

/*



--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-05-31 10:50:02

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]CPU hotplug breaks wake_up_new_task

Srivatsa Vaddagiri wrote:
> On Tue, May 31, 2005 at 07:46:13PM +1000, Nick Piggin wrote:
>
>>And this patch will break balance-on-fork.
>
>
> Right :-)
>
>
>>How about conditionally setting task_cpu if the task's current
>>CPU is offline?
>
>
> Something like this?
>

That's exactly what I had in mind ;)
Shaohua, do you agree?

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-05-31 11:41:43

by Shaohua Li

[permalink] [raw]
Subject: RE: [PATCH]CPU hotplug breaks wake_up_new_task

>
>Srivatsa Vaddagiri wrote:
>> On Tue, May 31, 2005 at 07:46:13PM +1000, Nick Piggin wrote:
>>
>>>And this patch will break balance-on-fork.
>>
>>
>> Right :-)
>>
>>
>>>How about conditionally setting task_cpu if the task's current
>>>CPU is offline?
>>
>>
>> Something like this?
>>
>
>That's exactly what I had in mind ;)
>Shaohua, do you agree?
That's better. Thanks!

Cheers,
Shaohua