LinuxLists.cc - Re: suspend race -mm regression [Was: Power: fix suspend vt regression]

2009-09-04 11:49:26

Subject: Re: suspend race -mm regression [Was: Power: fix suspend vt regression]

On 08/31/2009 09:32 PM, Rafael J. Wysocki wrote:
> On Monday 31 August 2009, Jiri Slaby wrote:
>> On 08/11/2009 11:19 PM, Jiri Slaby wrote:
>>> However there is still a race or something. Sometimes the suspend goes
>>> through, sometimes it doesn't. I will investigate this further.
>>
>> Hmm, this took a loong time to track down a bit. Code instrumentation by
>> outb(XX, 0x80) usually caused the issue to disappear.
>>
>> However I found out that it's caused by might_sleep() calls in
>> flush_workqueue() and flush_cpu_workqueue(). I.e. it looks like there is
>> a task which deadlocks/spins forever. If we won't reschedule to it,
>> suspend proceeds.
>>
>> I replaced the latter might_sleep() by show_state() and removed
>> refrigerated tasks afterwards. The thing is that I don't know if the
>> prank task is there. I need a scheduler to store "next" task pid or
>> whatever to see what it picked as "next" and so what will run due to
>> might_sched(). I can then show it on port 80 display and read it when
>> the hangup occurs.
>>
>> Depending on which might_sleep(), either flush_workqueue() never (well,
>> at least in next 5 minutes) proceeds to for_each_cpu() or
>> wait_for_completion() in flush_cpu_workqueue() never returns.
>>
>> It's a regression against some -rc1 based -next tree. Bisection
>> impossible, suspend needs to be run even 7 times before it occurs. Maybe
>> a s/might_sleep/yield/ could make it happen earlier (going to try)?
>
> If /sys/class/rtc/rtc0/wakealarm works on this box, you can use it to trigger
> resume in a loop.
>
> Basically, you can do
>
> # echo 0 > /sys/class/rtc/rtc0/wakealarm
> # date +%s -d "+60 seconds" > /sys/class/rtc/rtc0/wakealarm
>
> then go to suspend and it will resume the box in ~1 minute.

Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
cpu_hotplug-dont-affect-current-tasks-affinity.patch

Well, I don't know why, but when the kthread overthere runs under
suspend conditions and gets rescheduled (e.g. by the might_sleep()
inside) it never returns. pick_next_task always returns the idle task
from the idle queue. State of the thread is TASK_RUNNING.

Why is it not enqueued into some queue? I tried also
sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
it wrong, it seems like a global scheduler problem?

Ingo, any ideas?

Thanks.

2009-09-04 22:31:00

by Jiri Slaby

[permalink] [raw]

Subject: Re: suspend race -mm regression [Was: Power: fix suspend vt regression]

CCs reduced.

On 09/04/2009 01:49 PM, Jiri Slaby wrote:
> On 08/31/2009 09:32 PM, Rafael J. Wysocki wrote:
>> On Monday 31 August 2009, Jiri Slaby wrote:
>>> On 08/11/2009 11:19 PM, Jiri Slaby wrote:
>>>> However there is still a race or something. Sometimes the suspend goes
>>>> through, sometimes it doesn't. I will investigate this further.
>>>
>>> Hmm, this took a loong time to track down a bit. Code instrumentation by
>>> outb(XX, 0x80) usually caused the issue to disappear.
>>>
>>> However I found out that it's caused by might_sleep() calls in
>>> flush_workqueue() and flush_cpu_workqueue(). I.e. it looks like there is
>>> a task which deadlocks/spins forever. If we won't reschedule to it,
>>> suspend proceeds.
>>>
>>> I replaced the latter might_sleep() by show_state() and removed
>>> refrigerated tasks afterwards. The thing is that I don't know if the
>>> prank task is there. I need a scheduler to store "next" task pid or
>>> whatever to see what it picked as "next" and so what will run due to
>>> might_sched(). I can then show it on port 80 display and read it when
>>> the hangup occurs.
>>>
>>> Depending on which might_sleep(), either flush_workqueue() never (well,
>>> at least in next 5 minutes) proceeds to for_each_cpu() or
>>> wait_for_completion() in flush_cpu_workqueue() never returns.
>>>
>>> It's a regression against some -rc1 based -next tree. Bisection
>>> impossible, suspend needs to be run even 7 times before it occurs. Maybe
>>> a s/might_sleep/yield/ could make it happen earlier (going to try)?
>>
>> If /sys/class/rtc/rtc0/wakealarm works on this box, you can use it to trigger
>> resume in a loop.
>>
>> Basically, you can do
>>
>> # echo 0 > /sys/class/rtc/rtc0/wakealarm
>> # date +%s -d "+60 seconds" > /sys/class/rtc/rtc0/wakealarm
>>
>> then go to suspend and it will resume the box in ~1 minute.
>
> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> cpu_hotplug-dont-affect-current-tasks-affinity.patch

BTW. when I reverted it, during suspend I got a warning:
SMP alternatives: switching to UP code
------------[ cut here ]------------
WARNING: at kernel/smp.c:124
__generic_smp_call_function_interrupt+0xfd/0x110()
Hardware name: To Be Filled By O.E.M.
Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
Call Trace:
[<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
[<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
[<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
[<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
[<ffffffff81434e47>] notifier_call_chain+0x47/0x90
[<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
[<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
[<ffffffff8104169b>] disable_nonboot_cpus+0xab/0x130
[<ffffffff8106ee3d>] suspend_devices_and_enter+0xad/0x1a0
[<ffffffff8106f00b>] enter_state+0xdb/0xf0
[<ffffffff8106e741>] state_store+0x91/0x100
[<ffffffff8116c157>] kobj_attr_store+0x17/0x20
[<ffffffff8111c6a0>] sysfs_write_file+0xe0/0x160
[<ffffffff810c3ce8>] vfs_write+0xb8/0x1b0
[<ffffffff81434c35>] ? do_page_fault+0x185/0x350
[<ffffffff810c434c>] sys_write+0x4c/0x80
[<ffffffff8100be2b>] system_call_fastpath+0x16/0x1b
---[ end trace 73264e95657dec65 ]---
CPU1 is down

> Well, I don't know why, but when the kthread overthere runs under
> suspend conditions and gets rescheduled (e.g. by the might_sleep()
> inside) it never returns. pick_next_task always returns the idle task
> from the idle queue. State of the thread is TASK_RUNNING.
>
> Why is it not enqueued into some queue? I tried also
> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
> it wrong, it seems like a global scheduler problem?
>
> Ingo, any ideas?

2009-09-04 22:36:26

by Jiri Slaby

[permalink] [raw]

Subject: Re: suspend race -mm regression [Was: Power: fix suspend vt regression]

On 09/05/2009 12:30 AM, Jiri Slaby wrote:
> WARNING: at kernel/smp.c:124
> __generic_smp_call_function_interrupt+0xfd/0x110()
> Hardware name: To Be Filled By O.E.M.
> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
> Call Trace:
> [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
> [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
> [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
> [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
> [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
> [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
> [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0

It's the CPU_DEAD notifier:
ffffffff8141ecd0: 48 83 ce 07 or $0x7,%rsi
ffffffff8141ecd4: 48 c7 c7 08 ff 5d 81 mov
$0xffffffff815dff08,%rdi
ffffffff8141ecdb: e8 20 c6 c3 ff callq ffffffff8105b300
<raw_notifier_call_chain>
ffffffff8141ece0: 3d 02 80 00 00 cmp $0x8002,%eax

2009-09-05 12:39:48

by Jiri Slaby

[permalink] [raw]

Subject: [-mm] warning during suspend [was: suspend race -mm regression]

On 09/05/2009 12:36 AM, Jiri Slaby wrote:
> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
>> WARNING: at kernel/smp.c:124
>> __generic_smp_call_function_interrupt+0xfd/0x110()
>> Hardware name: To Be Filled By O.E.M.
>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
>> Call Trace:
>> [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
>> [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
>> [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
>> [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
>> [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
>> [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
>> [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
>
> It's the CPU_DEAD notifier:
> ffffffff8141ecd0: 48 83 ce 07 or $0x7,%rsi
> ffffffff8141ecd4: 48 c7 c7 08 ff 5d 81 mov
> $0xffffffff815dff08,%rdi
> ffffffff8141ecdb: e8 20 c6 c3 ff callq ffffffff8105b300
> <raw_notifier_call_chain>
> ffffffff8141ece0: 3d 02 80 00 00 cmp $0x8002,%eax

And it's due to:
generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch

Should the WARN_ONs now warn only when run_callbacks is true?

2009-09-05 13:44:08

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [-mm] warning during suspend [was: suspend race -mm regression]

Jiri Slaby 写道:
> On 09/05/2009 12:36 AM, Jiri Slaby wrote:
>> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
>>> WARNING: at kernel/smp.c:124
>>> __generic_smp_call_function_interrupt+0xfd/0x110()
>>> Hardware name: To Be Filled By O.E.M.
>>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
>>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
>>> Call Trace:
>>> [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
>>> [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
>>> [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
>>> [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
>>> [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
>>> [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
>>> [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
>> It's the CPU_DEAD notifier:
>> ffffffff8141ecd0: 48 83 ce 07 or $0x7,%rsi
>> ffffffff8141ecd4: 48 c7 c7 08 ff 5d 81 mov
>> $0xffffffff815dff08,%rdi
>> ffffffff8141ecdb: e8 20 c6 c3 ff callq ffffffff8105b300
>> <raw_notifier_call_chain>
>> ffffffff8141ece0: 3d 02 80 00 00 cmp $0x8002,%eax
>
> And it's due to:
> generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
>

I think it has collision between my patch and below patch:

Commit-ID: 269c861baa2fe7c114c3bc7831292758d29eb336
Gitweb: http://git.kernel.org/tip/269c861baa2fe7c114c3bc7831292758d29eb336
Author: Suresh Siddha <[email protected]>
AuthorDate: Wed, 19 Aug 2009 18:05:35 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Fri, 21 Aug 2009 16:25:43 -0700

generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled

My patch is merged at -mm tree, but this patch is base on -tip tree later, so it has this
problem

Suresh, what your opinion?

Thanks,
Xiao

> Should the WARN_ONs now warn only when run_callbacks is true?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-09-09 11:41:43

by Jiri Slaby

[permalink] [raw]

Subject: [PATCH 1/1] sched: fix cpu_down deadlock

Jiri Slaby wrote:
> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> cpu_hotplug-dont-affect-current-tasks-affinity.patch
>
> Well, I don't know why, but when the kthread overthere runs under
> suspend conditions and gets rescheduled (e.g. by the might_sleep()
> inside) it never returns. pick_next_task always returns the idle task
> from the idle queue. State of the thread is TASK_RUNNING.
>
> Why is it not enqueued into some queue? I tried also
> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
> it wrong, it seems like a global scheduler problem?

Actually not, it definitely seems like a cpu_down problem.

> Ingo, any ideas?

Apparently not, but nevermind :). What about the patch below?

--

After a cpu is taken down in __stop_machine, the kcpu_thread still may be
rescheduled to that cpu, but in fact the cpu is not running at that
moment.

This causes kcpu_thread to never run again, because its enqueued on another
runqueue, hence pick_next_task never selects it on the set of newly
running cpus.

We do set_cpus_allowed_ptr in _cpu_down_thread, but cpu_active_mask is
updated to not contain the cpu which goes down even after the thread finishes
(and _cpu_down returns).

For me this triggers mostly while suspending a SMP machine with
FAIR_GROUP_SCHED enabled and
cpu_hotplug-dont-affect-current-tasks-affinity patch applied. The patch
adds kthread to the cpu_down pipeline.

Fix this issue by eliminating the to-be-killed-cpu from active_cpu
locally.

Signed-off-by: Jiri Slaby <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/cpu.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index be9c5ad..17a3635 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -196,6 +196,14 @@ static int __ref _cpu_down_thread(void *_param)
unsigned long mod = param->mod;
unsigned int cpu = param->cpu;
void *hcpu = (void *)(long)cpu;
+ cpumask_var_t active_mask;
+
+ if (!alloc_cpumask_var(&active_mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ /* make sure we are not running on the cpu which goes down,
+ cpu_active_mask is altered even after we return! */
+ cpumask_andnot(active_mask, cpu_active_mask, cpumask_of(cpu));

cpu_hotplug_begin();
err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
@@ -211,7 +219,7 @@ static int __ref _cpu_down_thread(void *_param)
}

/* Ensure that we are not runnable on dying cpu */
- set_cpus_allowed_ptr(current, cpu_active_mask);
+ set_cpus_allowed_ptr(current, active_mask);

err = __stop_machine(take_cpu_down, param, cpumask_of(cpu));
if (err) {
@@ -237,9 +245,9 @@ static int __ref _cpu_down_thread(void *_param)
BUG();

check_for_tasks(cpu);
-
out_release:
cpu_hotplug_done();
+ free_cpumask_var(active_mask);
if (!err) {
if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
hcpu) == NOTIFY_BAD)
--
1.6.3.3

2009-09-09 11:54:03

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock

On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
> Jiri Slaby wrote:
> > Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> > cpu_hotplug-dont-affect-current-tasks-affinity.patch

Is there a git tree with -mm in some place? I can't seem to find that
patch in my inbox.

All I can find is some comments from Oleg that the patch looks funny.

2009-09-09 12:28:56

by Jiri Slaby

[permalink] [raw]

Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock

On 09/09/2009 01:53 PM, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
>> Jiri Slaby wrote:
>>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
>>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
>
> Is there a git tree with -mm in some place? I can't seem to find that
> patch in my inbox.
>
> All I can find is some comments from Oleg that the patch looks funny.

Yes, here:
git://git.zen-sources.org/zen/mmotm.git

Actually I found Oleg came up with better solution to add
move_task_off_dead_cpu to take_cpu_down.

A discussion regarding this is at:
http://lkml.indiana.edu/hypermail/linux/kernel/0907.3/02278.html

So what's the status of the patches, please?

2009-09-09 12:38:14

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock

On Wed, 2009-09-09 at 14:23 +0200, Jiri Slaby wrote:
> On 09/09/2009 01:53 PM, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
> >> Jiri Slaby wrote:
> >>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> >>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
> >
> > Is there a git tree with -mm in some place? I can't seem to find that
> > patch in my inbox.
> >
> > All I can find is some comments from Oleg that the patch looks funny.
>
> Yes, here:
> git://git.zen-sources.org/zen/mmotm.git

Ah thanks, no wonder I didn't find it.

> Actually I found Oleg came up with better solution to add
> move_task_off_dead_cpu to take_cpu_down.
>
> A discussion regarding this is at:
> http://lkml.indiana.edu/hypermail/linux/kernel/0907.3/02278.html
>
> So what's the status of the patches, please?

Oleg's patch looks good to me.

2009-09-09 13:51:38

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock

On 09/09, Jiri Slaby wrote:
>
> On 09/09/2009 01:53 PM, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 13:41 +0200, Jiri Slaby wrote:
> >> Jiri Slaby wrote:
> >>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
> >>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
> >
> > Is there a git tree with -mm in some place? I can't seem to find that
> > patch in my inbox.
> >
> > All I can find is some comments from Oleg that the patch looks funny.
>
> Yes, here:
> git://git.zen-sources.org/zen/mmotm.git
>
> Actually I found Oleg came up with better solution to add
> move_task_off_dead_cpu to take_cpu_down.
>
> A discussion regarding this is at:
> http://lkml.indiana.edu/hypermail/linux/kernel/0907.3/02278.html
>
> So what's the status of the patches, please?

This patch depends on another one, please see

"[PATCH] cpusets: rework guarantee_online_cpus() to fix deadlock with cpu_down()"
http://marc.info/?t=124910242400002

(as the changelog says, the patch is not complete: we need
->cpumask_lock every time we update cs->allowed, but this
should be trivial)

In short: cpuset_lock() is buggy. But more importantly it is afaics unneeded,
and imho should die. I seem to answer all Lai's questions, but the patch was
ignored by maintainers.

I noticed another race in update_cpumask() which I was going to fix, but since
maintainers ignore me I lost the motivtion ;) Besides, currently I dont have
the time anyway.

So I think the original patch which creates the kthread is the best option.

Oleg.

2009-09-10 20:58:32

by Andrew Morton

[permalink] [raw]

Subject: Re: [-mm] warning during suspend [was: suspend race -mm regression]

On Sat, 05 Sep 2009 22:41:37 +0800
Xiao Guangrong <[email protected]> wrote:

> Jiri Slaby ______:
> > On 09/05/2009 12:36 AM, Jiri Slaby wrote:
> >> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
> >>> WARNING: at kernel/smp.c:124
> >>> __generic_smp_call_function_interrupt+0xfd/0x110()
> >>> Hardware name: To Be Filled By O.E.M.
> >>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
> >>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
> >>> Call Trace:
> >>> [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
> >>> [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
> >>> [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
> >>> [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
> >>> [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
> >>> [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
> >>> [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
> >> It's the CPU_DEAD notifier:
> >> ffffffff8141ecd0: 48 83 ce 07 or $0x7,%rsi
> >> ffffffff8141ecd4: 48 c7 c7 08 ff 5d 81 mov
> >> $0xffffffff815dff08,%rdi
> >> ffffffff8141ecdb: e8 20 c6 c3 ff callq ffffffff8105b300
> >> <raw_notifier_call_chain>
> >> ffffffff8141ece0: 3d 02 80 00 00 cmp $0x8002,%eax
> >
> > And it's due to:
> > generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
> >
>
> I think it has collision between my patch and below patch:
>
> Commit-ID: 269c861baa2fe7c114c3bc7831292758d29eb336
> Gitweb: http://git.kernel.org/tip/269c861baa2fe7c114c3bc7831292758d29eb336
> Author: Suresh Siddha <[email protected]>
> AuthorDate: Wed, 19 Aug 2009 18:05:35 -0700
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Fri, 21 Aug 2009 16:25:43 -0700
>
> generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
>
> My patch is merged at -mm tree, but this patch is base on -tip tree later, so it has this
> problem
>
> Suresh, what your opinion?
>

Suresh appears to be hiding.

Could you please propose a fix for this issue?

2009-09-11 00:01:04

by Suresh Siddha

[permalink] [raw]

Subject: Re: [-mm] warning during suspend [was: suspend race -mm regression]

On Thu, 2009-09-10 at 13:57 -0700, Andrew Morton wrote:
> On Sat, 05 Sep 2009 22:41:37 +0800
> Xiao Guangrong <[email protected]> wrote:
>
> > Jiri Slaby ______:
> > > On 09/05/2009 12:36 AM, Jiri Slaby wrote:
> > >> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
> > >>> WARNING: at kernel/smp.c:124
> > >>> __generic_smp_call_function_interrupt+0xfd/0x110()
> > >>> Hardware name: To Be Filled By O.E.M.
> > >>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
> > >>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
> > >>> Call Trace:
> > >>> [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
> > >>> [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
> > >>> [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
> > >>> [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
> > >>> [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
> > >>> [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
> > >>> [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
> > >> It's the CPU_DEAD notifier:
> > >> ffffffff8141ecd0: 48 83 ce 07 or $0x7,%rsi
> > >> ffffffff8141ecd4: 48 c7 c7 08 ff 5d 81 mov
> > >> $0xffffffff815dff08,%rdi
> > >> ffffffff8141ecdb: e8 20 c6 c3 ff callq ffffffff8105b300
> > >> <raw_notifier_call_chain>
> > >> ffffffff8141ece0: 3d 02 80 00 00 cmp $0x8002,%eax
> > >
> > > And it's due to:
> > > generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
> > >
> >
> > I think it has collision between my patch and below patch:

Xiao, I am not sure if the race that you are trying to fix here indeed
exists. Doesn't the stop machine that we do as part of cpu down address
and avoid the race that you mention? Have you seen any real crashes and
hangs or is it theory?

And if even the race exists (which I don't think) calling the interrupt
handler from the cpu down path looks like a hack.

Can you please elaborate why we need this patch? Then we can think of a
cleaner solution if needed.

> >
> > Commit-ID: 269c861baa2fe7c114c3bc7831292758d29eb336
> > Gitweb: http://git.kernel.org/tip/269c861baa2fe7c114c3bc7831292758d29eb336
> > Author: Suresh Siddha <[email protected]>
> > AuthorDate: Wed, 19 Aug 2009 18:05:35 -0700
> > Committer: H. Peter Anvin <[email protected]>
> > CommitDate: Fri, 21 Aug 2009 16:25:43 -0700
> >
> > generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
> >
> > My patch is merged at -mm tree, but this patch is base on -tip tree later, so it has this
> > problem
> >
> > Suresh, what your opinion?
> >
>
> Suresh appears to be hiding.

Not any more. I am back from vacation :(

thanks,
suresh

2009-09-11 06:10:52

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock

Jiri Slaby wrote:
> Jiri Slaby wrote:
>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
>>
>> Well, I don't know why, but when the kthread overthere runs under
>> suspend conditions and gets rescheduled (e.g. by the might_sleep()
>> inside) it never returns. pick_next_task always returns the idle task
>> from the idle queue. State of the thread is TASK_RUNNING.
>>
>> Why is it not enqueued into some queue? I tried also
>> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
>> it wrong, it seems like a global scheduler problem?
>
> Actually not, it definitely seems like a cpu_down problem.
>
>> Ingo, any ideas?
>
> Apparently not, but nevermind :). What about the patch below?
>
> --
>
> After a cpu is taken down in __stop_machine, the kcpu_thread still may be
> rescheduled to that cpu, but in fact the cpu is not running at that
> moment.
>
> This causes kcpu_thread to never run again, because its enqueued on another
> runqueue, hence pick_next_task never selects it on the set of newly
> running cpus.
>
> We do set_cpus_allowed_ptr in _cpu_down_thread, but cpu_active_mask is
> updated to not contain the cpu which goes down even after the thread finishes
> (and _cpu_down returns).
>
> For me this triggers mostly while suspending a SMP machine with
> FAIR_GROUP_SCHED enabled and
> cpu_hotplug-dont-affect-current-tasks-affinity patch applied. The patch
> adds kthread to the cpu_down pipeline.
>
> Fix this issue by eliminating the to-be-killed-cpu from active_cpu
> locally.
>
> Signed-off-by: Jiri Slaby <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> kernel/cpu.c | 12 ++++++++++--
> 1 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index be9c5ad..17a3635 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -196,6 +196,14 @@ static int __ref _cpu_down_thread(void *_param)
> unsigned long mod = param->mod;
> unsigned int cpu = param->cpu;
> void *hcpu = (void *)(long)cpu;
> + cpumask_var_t active_mask;
> +
> + if (!alloc_cpumask_var(&active_mask, GFP_KERNEL))
> + return -ENOMEM;
> +
> + /* make sure we are not running on the cpu which goes down,
> + cpu_active_mask is altered even after we return! */
> + cpumask_andnot(active_mask, cpu_active_mask, cpumask_of(cpu));
>
> cpu_hotplug_begin();
> err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> @@ -211,7 +219,7 @@ static int __ref _cpu_down_thread(void *_param)
> }
>
> /* Ensure that we are not runnable on dying cpu */
> - set_cpus_allowed_ptr(current, cpu_active_mask);
> + set_cpus_allowed_ptr(current, active_mask);
>
> err = __stop_machine(take_cpu_down, param, cpumask_of(cpu));
> if (err) {
> @@ -237,9 +245,9 @@ static int __ref _cpu_down_thread(void *_param)
> BUG();
>
> check_for_tasks(cpu);
> -
> out_release:
> cpu_hotplug_done();
> + free_cpumask_var(active_mask);
> if (!err) {
> if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
> hcpu) == NOTIFY_BAD)

Hi, Jiri Slaby

Does this bug occur when a cpu is being offlined or
when the system is being suspended?
Or Both?

Lai

2009-09-11 06:28:51

by Jiri Slaby

[permalink] [raw]

Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock

On 09/11/2009 08:09 AM, Lai Jiangshan wrote:
> Does this bug occur when a cpu is being offlined or
> when the system is being suspended?
> Or Both?

Hi, I tried echo 0/1 > /sys/devices/system/cpu/cpu1/online in a loop,
but it didn't trigger the bug. It happened only on suspend/resume cycle
(in the end I found even swsusp in qemu suffers from this).

2009-09-11 07:39:01

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock

Jiri Slaby wrote:
> On 09/11/2009 08:09 AM, Lai Jiangshan wrote:
>> Does this bug occur when a cpu is being offlined or
>> when the system is being suspended?
>> Or Both?
>
> Hi, I tried echo 0/1 > /sys/devices/system/cpu/cpu1/online in a loop,
> but it didn't trigger the bug. It happened only on suspend/resume cycle
> (in the end I found even swsusp in qemu suffers from this).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

OK, I knew where this bug is.

I thought the corresponding bit in cpu_active_mask is cleared before
_cpu_down(), but I missed the system-suspend path:disable_nonboot_cpus().

There is a bug in disable_nonboot_cpus() even if my patch is removed.

cpu_active_map is wrong during suspending.(scheduler system who uses
cpu_active_map are still working while suspending)

You need:

int disable_nonboot_cpus(void)
{
....
....

/*
* You need adding 'set_cpu_active(cpu, false);' here
* to fix this bug and make my patch works well.
*/
error = _cpu_down(cpu, 1);

....
....
}

Lai.

2009-09-11 07:55:54

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [-mm] warning during suspend [was: suspend race -mm regression]

Suresh Siddha wrote:
> On Thu, 2009-09-10 at 13:57 -0700, Andrew Morton wrote:
>> On Sat, 05 Sep 2009 22:41:37 +0800
>> Xiao Guangrong <[email protected]> wrote:
>>
>>> Jiri Slaby ______:
>>>> On 09/05/2009 12:36 AM, Jiri Slaby wrote:
>>>>> On 09/05/2009 12:30 AM, Jiri Slaby wrote:
>>>>>> WARNING: at kernel/smp.c:124
>>>>>> __generic_smp_call_function_interrupt+0xfd/0x110()
>>>>>> Hardware name: To Be Filled By O.E.M.
>>>>>> Modules linked in: nfs lockd auth_rpcgss sunrpc ath5k ath
>>>>>> Pid: 3423, comm: pm-suspend Not tainted 2.6.31-rc8-mm1_64 #762
>>>>>> Call Trace:
>>>>>> [<ffffffff8103fc48>] warn_slowpath_common+0x78/0xb0
>>>>>> [<ffffffff8103fc8f>] warn_slowpath_null+0xf/0x20
>>>>>> [<ffffffff8106950d>] __generic_smp_call_function_interrupt+0xfd/0x110
>>>>>> [<ffffffff8106956a>] hotplug_cfd+0x4a/0xa0
>>>>>> [<ffffffff81434e47>] notifier_call_chain+0x47/0x90
>>>>>> [<ffffffff8105b311>] raw_notifier_call_chain+0x11/0x20
>>>>>> [<ffffffff8141ece0>] _cpu_down+0x150/0x2d0
>>>>> It's the CPU_DEAD notifier:
>>>>> ffffffff8141ecd0: 48 83 ce 07 or $0x7,%rsi
>>>>> ffffffff8141ecd4: 48 c7 c7 08 ff 5d 81 mov
>>>>> $0xffffffff815dff08,%rdi
>>>>> ffffffff8141ecdb: e8 20 c6 c3 ff callq ffffffff8105b300
>>>>> <raw_notifier_call_chain>
>>>>> ffffffff8141ece0: 3d 02 80 00 00 cmp $0x8002,%eax
>>>> And it's due to:
>>>> generic-ipi-fix-the-race-between-generic_smp_call_function_-and-hotplug_cfd.patch
>>>>
>>> I think it has collision between my patch and below patch:
>
> Xiao, I am not sure if the race that you are trying to fix here indeed
> exists. Doesn't the stop machine that we do as part of cpu down address
> and avoid the race that you mention? Have you seen any real crashes and
> hangs or is it theory?
>

Suresh, please see my explanation in another mail, see below URL:
http://marc.info/?l=linux-kernel&m=125265516529139&w=2

Thanks,
Xiao