2009-12-22 00:17:55

by Eric Paris

[permalink] [raw]
Subject: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
I'm exploding in the scheduler. I'm running (and building) kernel
f7b84a6ba7eaeba4e1df8feddca1473a7db369a5 There are three distinct
signatures of problems. Some boots I'll see all 3 of these failures
sometimes only 1 or 2 of them. That's the reason they are kinda split
up in dmesg.

1) gcc/3141 is trying to acquire lock:
(&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46

but task is already holding lock:
(&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83

2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()

3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
kernel/sched_fair.c

Full backtraces are in the attached dmesg.


Attachments:
dmesg (73.78 kB)
config (101.37 kB)
Download all attachments

2009-12-22 05:47:31

by Xiaotian Feng

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?

On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <[email protected]> wrote:
> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
> I'm exploding in the scheduler.  I'm running (and building) kernel
> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
> signatures of problems.  Some boots I'll see all 3 of these failures
> sometimes only 1 or 2 of them.  That's the reason they are kinda split
> up in dmesg.
>
> 1) gcc/3141 is trying to acquire lock:
>  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>
> but task is already holding lock:
>  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>
> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>
> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>      kernel/sched_fair.c
>
> Full backtraces are in the attached dmesg.
>

2009-12-22 14:31:55

by Eric Paris

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

On Tue, 2009-12-22 at 09:48 +0100, Peter Zijlstra wrote:
> On Mon, 2009-12-21 at 19:17 -0500, Eric Paris wrote:
> > Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
> > I'm exploding in the scheduler. I'm running (and building) kernel
> > f7b84a6ba7eaeba4e1df8feddca1473a7db369a5 There are three distinct
> > signatures of problems. Some boots I'll see all 3 of these failures
> > sometimes only 1 or 2 of them. That's the reason they are kinda split
> > up in dmesg.

Appears the kernel built with no oops, circular locking, or bugs. Only
thing in dmesg during the build was:

DMA-API: debugging out of memory - disabling

I'm going to do it a number more times to be sure, but I had 100%
failure rate before. Sounds like this patch has got to go.

-Eric

> >
> > 1) gcc/3141 is trying to acquire lock:
> > (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
> >
> > but task is already holding lock:
> > (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>
> This is due to the pagefalut happening while holding the rq->lock, so
> its an artefact of 3).
>
> > 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>
> Worrying, but probably due to the same problem as 3)
>
> > 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
> > kernel/sched_fair.c
>
> Right, hard to tell where exactly it goes bang, but could you please try
> reverting the below patch.
>
> What I suspect happens is that we his the task_cpu(p)==cpu case, we then
> don't do __set_task_cpu()->set_task_rq(), which sets the group
> scheduling pointers (you seem to have cgroup scheduling enabled).
>
> If those pointers are wild all kinds of interesting bits can happen,
> including 3) and possibly 2).
>
> If this revert doesn't help, could you please also provide the output of
> addr2line -e vmlinux <FAULT_IP> ?
>
> ---
> commit 738d2be4301007f054541c5c4bf7fb6a361c9b3a
> Author: Peter Zijlstra <[email protected]>
> Date: Wed Dec 16 18:04:42 2009 +0100
>
> sched: Simplify set_task_cpu()
>
> Rearrange code a bit now that its a simpler function.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f92ce63..8a2bfd3 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2034,11 +2034,8 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
> return delta < (s64)sysctl_sched_migration_cost;
> }
>
> -
> void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> {
> - int old_cpu = task_cpu(p);
> -
> #ifdef CONFIG_SCHED_DEBUG
> /*
> * We should never call set_task_cpu() on a blocked task,
> @@ -2049,11 +2046,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>
> trace_sched_migrate_task(p, new_cpu);
>
> - if (old_cpu != new_cpu) {
> - p->se.nr_migrations++;
> - perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS,
> - 1, 1, NULL, 0);
> - }
> + if (task_cpu(p) == new_cpu)
> + return;
> +
> + p->se.nr_migrations++;
> + perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
>
> __set_task_cpu(p, new_cpu);
> }
>
>

2009-12-22 07:19:57

by Cong Wang

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

[Fix top-posting]

On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <[email protected]> wrote:
>
> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <[email protected]> wrote:
>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>> I'm exploding in the scheduler.  I'm running (and building) kernel
>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
>> signatures of problems.  Some boots I'll see all 3 of these failures
>> sometimes only 1 or 2 of them.  That's the reason they are kinda split
>> up in dmesg.
>>
>> 1) gcc/3141 is trying to acquire lock:
>>  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>
>> but task is already holding lock:
>>  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>
>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>
>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>>      kernel/sched_fair.c
>>
>> Full backtraces are in the attached dmesg.
>>
> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?


I don't think so...

I think the most suspicious commit here is ab19cb23. It kicked
"local_irq_save()"
out, which means if the task is selected to run on another cpu which doesn't
disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
while we are holding rq->lock already.

Does the following untested patch fix the problem?

NOT-signed-off-by: WANG Cong <[email protected]>

------
diff --git a/kernel/sched.c b/kernel/sched.c
index 87f1f47..221ab59 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2408,13 +2408,13 @@ static int try_to_wake_up(struct task_struct
*p, unsigned int state,
if (p->sched_class->task_waking)
p->sched_class->task_waking(rq, p);

- __task_rq_unlock(rq);
+ task_rq_unlock(rq);

cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
if (cpu != orig_cpu)
set_task_cpu(p, cpu);

- rq = __task_rq_lock(p);
+ rq = task_rq_lock(p);
update_rq_clock(rq);

WARN_ON(p->state != TASK_WAKING);

2009-12-22 07:41:40

by Xiaotian Feng

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

On Tue, Dec 22, 2009 at 3:19 PM, Américo Wang <[email protected]> wrote:
> [Fix top-posting]
>
> On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <[email protected]> wrote:
>>
>> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <[email protected]> wrote:
>>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>>> I'm exploding in the scheduler.  I'm running (and building) kernel
>>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
>>> signatures of problems.  Some boots I'll see all 3 of these failures
>>> sometimes only 1 or 2 of them.  That's the reason they are kinda split
>>> up in dmesg.
>>>
>>> 1) gcc/3141 is trying to acquire lock:
>>>  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>>
>>> but task is already holding lock:
>>>  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>>
>>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>>
>>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>>>      kernel/sched_fair.c
>>>
>>> Full backtraces are in the attached dmesg.
>>>
>> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?
>
>
> I don't think so...
>
> I think the most suspicious commit here is ab19cb23. It kicked
> "local_irq_save()"
> out, which means if the task is selected to run on another cpu which doesn't
> disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
> while we are holding rq->lock already.

The page fault is from kernel NULL pointer deref. You should connect
the lockdep warning and kernel BUG together.

>
> Does the following untested patch fix the problem?
>
> NOT-signed-off-by: WANG Cong <[email protected]>
>
> ------
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 87f1f47..221ab59 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2408,13 +2408,13 @@ static int try_to_wake_up(struct task_struct
> *p, unsigned int state,
>        if (p->sched_class->task_waking)
>                p->sched_class->task_waking(rq, p);
>
> -       __task_rq_unlock(rq);
> +       task_rq_unlock(rq);
>
>        cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
>        if (cpu != orig_cpu)
>                set_task_cpu(p, cpu);
>
> -       rq = __task_rq_lock(p);
> +       rq = task_rq_lock(p);
>        update_rq_clock(rq);
>
>        WARN_ON(p->state != TASK_WAKING);
>

2009-12-22 07:50:34

by Cong Wang

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

On Tue, Dec 22, 2009 at 3:41 PM, Xiaotian Feng <[email protected]> wrote:
> On Tue, Dec 22, 2009 at 3:19 PM, Américo Wang <[email protected]> wrote:
>> [Fix top-posting]
>>
>> On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <[email protected]> wrote:
>>>
>>> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <[email protected]> wrote:
>>>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>>>> I'm exploding in the scheduler.  I'm running (and building) kernel
>>>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
>>>> signatures of problems.  Some boots I'll see all 3 of these failures
>>>> sometimes only 1 or 2 of them.  That's the reason they are kinda split
>>>> up in dmesg.
>>>>
>>>> 1) gcc/3141 is trying to acquire lock:
>>>>  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>>>
>>>> but task is already holding lock:
>>>>  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>>>
>>>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>>>
>>>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>>>>      kernel/sched_fair.c
>>>>
>>>> Full backtraces are in the attached dmesg.
>>>>
>>> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?
>>
>>
>> I don't think so...
>>
>> I think the most suspicious commit here is ab19cb23. It kicked
>> "local_irq_save()"
>> out, which means if the task is selected to run on another cpu which doesn't
>> disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
>> while we are holding rq->lock already.
>
> The page fault is from kernel  NULL pointer deref.  You should connect
> the lockdep warning and kernel BUG together.
>

Interesting.

1) Doesn't this NULL ptr def expose that we have a potential problem?

2) For NULL ptr def problem, commit 3a7e73a2e2 seems more suspicious..

2009-12-22 08:35:07

by Xiaotian Feng

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

On Tue, Dec 22, 2009 at 3:50 PM, Américo Wang <[email protected]> wrote:
> On Tue, Dec 22, 2009 at 3:41 PM, Xiaotian Feng <[email protected]> wrote:
>> On Tue, Dec 22, 2009 at 3:19 PM, Américo Wang <[email protected]> wrote:
>>> [Fix top-posting]
>>>
>>> On Tue, Dec 22, 2009 at 1:42 PM, Xiaotian Feng <[email protected]> wrote:
>>>>
>>>> On Tue, Dec 22, 2009 at 8:17 AM, Eric Paris <[email protected]> wrote:
>>>>> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
>>>>> I'm exploding in the scheduler.  I'm running (and building) kernel
>>>>> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5  There are three distinct
>>>>> signatures of problems.  Some boots I'll see all 3 of these failures
>>>>> sometimes only 1 or 2 of them.  That's the reason they are kinda split
>>>>> up in dmesg.
>>>>>
>>>>> 1) gcc/3141 is trying to acquire lock:
>>>>>  (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>>>>>
>>>>> but task is already holding lock:
>>>>>  (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83
>>>>>
>>>>> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()
>>>>>
>>>>> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
>>>>>      kernel/sched_fair.c
>>>>>
>>>>> Full backtraces are in the attached dmesg.
>>>>>
>>>> Does a revert of cd29fe6f2637cc2ccbda5ac65f5332d6bf5fa3c6 fix this problem?
>>>
>>>
>>> I don't think so...
>>>
>>> I think the most suspicious commit here is ab19cb23. It kicked
>>> "local_irq_save()"
>>> out, which means if the task is selected to run on another cpu which doesn't
>>> disable irq, we will have a page fault, thun we will try to hold mm->mmap_sem
>>> while we are holding rq->lock already.
>>
>> The page fault is from kernel  NULL pointer deref.  You should connect
>> the lockdep warning and kernel BUG together.
>>
>
> Interesting.
>
> 1) Doesn't this NULL ptr def expose that we have a potential problem?
>
> 2) For NULL ptr def problem, commit 3a7e73a2e2 seems more suspicious..

I don't think so,
(gdb) l *check_preempt_wakeup+0x170
0xffffffff8103c815 is in is_same_group (kernel/sched_fair.c:154).
(gdb) assemble check_preempt_wakeup
<snip>
0xffffffff8103c815 <is_same_group+0>: mov 0x168(%rsi),%rax
<snip>

The panic is from NULL pointer deref at 0000000000000168, so some time
in is_same_group() while loop,
parent_entity(*pse) = NULL, then is_same_group() trying to visit
pse->cfs_rq, NULL pointer deref was triggered.

commit 3a7e73, the behaviour for find_matching_se() is same as before,
this commit should not be the buggy one.

>

2009-12-22 08:49:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

On Mon, 2009-12-21 at 19:17 -0500, Eric Paris wrote:
> Trying to build a kernel on a 48 core x86_64 box using make -j 64 and
> I'm exploding in the scheduler. I'm running (and building) kernel
> f7b84a6ba7eaeba4e1df8feddca1473a7db369a5 There are three distinct
> signatures of problems. Some boots I'll see all 3 of these failures
> sometimes only 1 or 2 of them. That's the reason they are kinda split
> up in dmesg.
>
> 1) gcc/3141 is trying to acquire lock:
> (&(&sem->wait_lock)->rlock){......}, at: [<ffffffff81223234>] __down_read_trylock+0x13/0x46
>
> but task is already holding lock:
> (&rq->lock){-.-.-.}, at: [<ffffffff8103dd2d>] task_rq_lock+0x51/0x83

This is due to the pagefalut happening while holding the rq->lock, so
its an artefact of 3).

> 2) WARN() in kernel/sched_fair.c:1001 hrtick_start_fair()

Worrying, but probably due to the same problem as 3)

> 3) NULL pointer dereference at 0000000000000168 in check_preempt_wakeup
> kernel/sched_fair.c

Right, hard to tell where exactly it goes bang, but could you please try
reverting the below patch.

What I suspect happens is that we his the task_cpu(p)==cpu case, we then
don't do __set_task_cpu()->set_task_rq(), which sets the group
scheduling pointers (you seem to have cgroup scheduling enabled).

If those pointers are wild all kinds of interesting bits can happen,
including 3) and possibly 2).

If this revert doesn't help, could you please also provide the output of
addr2line -e vmlinux <FAULT_IP> ?

---
commit 738d2be4301007f054541c5c4bf7fb6a361c9b3a
Author: Peter Zijlstra <[email protected]>
Date: Wed Dec 16 18:04:42 2009 +0100

sched: Simplify set_task_cpu()

Rearrange code a bit now that its a simpler function.

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/kernel/sched.c b/kernel/sched.c
index f92ce63..8a2bfd3 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2034,11 +2034,8 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
return delta < (s64)sysctl_sched_migration_cost;
}

-
void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
{
- int old_cpu = task_cpu(p);
-
#ifdef CONFIG_SCHED_DEBUG
/*
* We should never call set_task_cpu() on a blocked task,
@@ -2049,11 +2046,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)

trace_sched_migrate_task(p, new_cpu);

- if (old_cpu != new_cpu) {
- p->se.nr_migrations++;
- perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS,
- 1, 1, NULL, 0);
- }
+ if (task_cpu(p) == new_cpu)
+ return;
+
+ p->se.nr_migrations++;
+ perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);

__set_task_cpu(p, new_cpu);
}

2009-12-22 11:29:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.33-rc1 unusable due to scheduler issues, circular locking, WARNs and BUGs

On Tue, 22 Dec 2009 09:48:40 +0100
Peter Zijlstra <[email protected]> wrote:
>
> If this revert doesn't help, could you please also provide the output
> of addr2line -e vmlinux <FAULT_IP> ?

or even better, use scripts/markup_oops.pl to get a full disassembly +
C code of where the crash happens

2009-12-22 14:42:07

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] sched:

As demonstrated by Eric, we really need to call __set_task_cpu() early
in the fork() path to properly initialize the various task state --
specifically the cgroup state through set_task_rq().

[ we could probably fix this by explicitly calling __set_task_cpu() from
sched_fork(), but lets try that for the next cycle and simply revert
to the old behaviour for now. ]

Reported-by: Eric Paris <[email protected]>
Tested-by: Eric Paris <[email protected]>,
Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2067,11 +2067,10 @@ void set_task_cpu(struct task_struct *p,

trace_sched_migrate_task(p, new_cpu);

- if (task_cpu(p) == new_cpu)
- return;
-
- p->se.nr_migrations++;
- perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
+ if (task_cpu(p) != new_cpu) {
+ p->se.nr_migrations++;
+ perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
+ }

__set_task_cpu(p, new_cpu);
}

2009-12-22 14:43:59

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] sched: Revert 738d2be, Simplify set_task_cpu()

[ and now with Subject ]

Effectively reverts 738d2be4301007f054541c5c4bf7fb6a361c9b3a.

As demonstrated by Eric, we really need to call __set_task_cpu() early
in the fork() path to properly initialize the various task state --
specifically the cgroup state through set_task_rq().

[ we could probably fix this by explicitly calling __set_task_cpu() from
sched_fork(), but lets try that for the next cycle and simply revert
to the old behaviour for now. ]

Reported-by: Eric Paris <[email protected]>
Tested-by: Eric Paris <[email protected]>,
Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2067,11 +2067,10 @@ void set_task_cpu(struct task_struct *p,

trace_sched_migrate_task(p, new_cpu);

- if (task_cpu(p) == new_cpu)
- return;
-
- p->se.nr_migrations++;
- perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
+ if (task_cpu(p) != new_cpu) {
+ p->se.nr_migrations++;
+ perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
+ }

__set_task_cpu(p, new_cpu);
}


2009-12-23 09:07:22

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:sched/urgent] sched: Revert 738d2be, simplify set_task_cpu()

Commit-ID: 0c69774e6ce94364cfaa8bdeb18061edc414bc5a
Gitweb: http://git.kernel.org/tip/0c69774e6ce94364cfaa8bdeb18061edc414bc5a
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 22 Dec 2009 15:43:19 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 23 Dec 2009 10:04:10 +0100

sched: Revert 738d2be, simplify set_task_cpu()

Effectively reverts 738d2be4301007f054541c5c4bf7fb6a361c9b3a.

As demonstrated by Eric, we really need to call __set_task_cpu()
early in the fork() path to properly initialize the various task
state -- specifically the cgroup state through set_task_rq().

[ we could probably fix this by explicitly calling
__set_task_cpu() from sched_fork(), but lets try that for the
next cycle and simply revert to the old behaviour for now. ]

Reported-by: Eric Paris <[email protected]>
Tested-by: Eric Paris <[email protected]>,
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
LKML-Reference: <1261492999.4937.36.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 87f1f47..c535cc4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2045,11 +2045,10 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)

trace_sched_migrate_task(p, new_cpu);

- if (task_cpu(p) == new_cpu)
- return;
-
- p->se.nr_migrations++;
- perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
+ if (task_cpu(p) != new_cpu) {
+ p->se.nr_migrations++;
+ perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
+ }

__set_task_cpu(p, new_cpu);
}