2007-05-29 17:40:43

by Mark Hounschell

[permalink] [raw]
Subject: floppy.c soft lockup

Changes in floppy.c from 2.6.17 and 2.6.18 have broken an application I have. I have tracked
it down to a single line of code. When the following patch is applied to the version in 2.6.18
my application works.

--- linux-2.6.18/drivers/block/floppy.c 2006-09-19 23:42:06.000000000 -0400
+++ linux-2.6.18-crt/drivers/block/floppy.c 2007-05-29 09:12:20.000000000 -0400
@@ -893,7 +893,6 @@
set_current_state(TASK_RUNNING);
remove_wait_queue(&fdc_wait, &wait);

- flush_scheduled_work();
}
command_status = FD_COMMAND_NONE;

I don't claim to understand the changes from 2.6.17 to 2.6.18 except for the devfs removal.
All I can say is this one line of code kills the application. I have tried to write a short pgm
that shows my problem but everything else I write seems to work. The application only runs
on SMP machines and uses process and irq affinities with real-time scheduling. When I turn
off process and irq affinities the application runs.

I have tried kernels up through 2.6.21.1 with the same results. All kernels from 2.6.18 up
require that I remove this one line of code or my application does not work?

Regards
Mark




2007-05-31 05:47:18

by Andrew Morton

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On Tue, 29 May 2007 13:31:05 -0400 Mark Hounschell <[email protected]> wrote:

> Changes in floppy.c from 2.6.17 and 2.6.18 have broken an application I have. I have tracked
> it down to a single line of code. When the following patch is applied to the version in 2.6.18
> my application works.
>
> --- linux-2.6.18/drivers/block/floppy.c 2006-09-19 23:42:06.000000000 -0400
> +++ linux-2.6.18-crt/drivers/block/floppy.c 2007-05-29 09:12:20.000000000 -0400
> @@ -893,7 +893,6 @@
> set_current_state(TASK_RUNNING);
> remove_wait_queue(&fdc_wait, &wait);
>
> - flush_scheduled_work();
> }
> command_status = FD_COMMAND_NONE;
>
> I don't claim to understand the changes from 2.6.17 to 2.6.18 except for the devfs removal.
> All I can say is this one line of code kills the application. I have tried to write a short pgm
> that shows my problem but everything else I write seems to work. The application only runs
> on SMP machines and uses process and irq affinities with real-time scheduling. When I turn
> off process and irq affinities the application runs.
>
> I have tried kernels up through 2.6.21.1 with the same results. All kernels from 2.6.18 up
> require that I remove this one line of code or my application does not work?

Interesting. I'd expect that the calling process is spinning, with realtime
policy and is expecting some other process to do something (ie: run a workqueue).

If you keep the process and irq affinities, and disable the realtime policy
does that also prevent the problem?

It would be interesting it you could capture a few task traces while it is stuck:
echo 1 > /proc/sys/kernel/sysrq then do ALT-SYSRQ-P a bunch of times and ALT-SYSRQ-T,
see if you can work out where the CPU is stuck.

ALso, 2.6.22-rc3 might have accidentally fixed this.

2007-05-31 14:28:38

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S E6513DD8 0 12647 12288 (NOTLB)
May 31 10:13:11 harley kernel: e6513dec 00200082 00000002 e6513dd8 e6513dd4 00000000 00000000 00000000
May 31 10:13:11 harley kernel: 00000001 e8c0c030 dbde5030 edfda2ef 00000298 00000ef0 e8c0c150 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 e6513e14 c012fc27 00200202 e6513eac e6513e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S CA655DD8 0 12648 12288 (NOTLB)
May 31 10:13:11 harley kernel: ca655dec 00200082 00000002 ca655dd8 ca655dd4 00000000 00000000 b6c97248
May 31 10:13:11 harley kernel: 00000001 dbde5030 dbde5a90 edfdb21a 00000298 00000f2b dbde5150 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 ca655e14 c012fc27 00200202 ca655eac ca655e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S DED1BDD8 0 12649 12288 (NOTLB)
May 31 10:13:11 harley kernel: ded1bdec 00200082 00000002 ded1bdd8 ded1bdd4 00000000 00000060 00200206
May 31 10:13:11 harley kernel: 00000001 dbde5a90 d629aa90 edfdbf8c 00000298 00000d72 dbde5bb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 ded1be14 c012fc27 00200202 ded1beac ded1be20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S DC153DD8 0 12650 12288 (NOTLB)
May 31 10:13:11 harley kernel: dc153dec 00200082 00000002 dc153dd8 dc153dd4 00000000 00000060 00200206
May 31 10:13:11 harley kernel: 00000001 d629aa90 d629a030 edfdce5a 00000298 00000ece d629abb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 dc153e14 c012fc27 00200202 dc153eac dc153e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S C8E1DDD8 0 12651 12288 (NOTLB)
May 31 10:13:11 harley kernel: c8e1ddec 00200082 00000002 c8e1ddd8 c8e1ddd4 00000000 c199f480 dfe47d34
May 31 10:13:11 harley kernel: 00000001 d629a030 f3bcba90 edfddd76 00000298 00000f1c d629a150 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 c8e1de14 c012fc27 00200202 c8e1deac c8e1de20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S C3F61DD8 0 12652 12288 (NOTLB)
May 31 10:13:11 harley kernel: c3f61dec 00200082 00000002 c3f61dd8 c3f61dd4 00000000 00000060 00200206
May 31 10:13:11 harley kernel: 00000001 f3bcba90 f75b8030 edfdebdd 00000298 00000e67 f3bcbbb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 c3f61e14 c012fc27 00200202 c3f61eac c3f61e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S C4025DD8 0 12653 12288 (NOTLB)
May 31 10:13:11 harley kernel: c4025dec 00200082 00000002 c4025dd8 c4025dd4 00000000 f75b8030 0ddeea02
May 31 10:13:11 harley kernel: 00000001 f75b8030 f75b8a90 edfdf9d6 00000298 00000df9 f75b8150 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 c4025e14 c012fc27 00200202 c4025eac c4025e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S F4F5DDD8 0 12654 12288 (NOTLB)
May 31 10:13:11 harley kernel: f4f5ddec 00200082 00000002 f4f5ddd8 f4f5ddd4 00000000 f75b8a90 0df68c4e
May 31 10:13:11 harley kernel: 00000001 f75b8a90 c4846a90 edfe094e 00000298 00000f78 f75b8bb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 f4f5de14 c012fc27 00200202 f4f5deac f4f5de20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S F1AD1DD8 0 12655 12288 (NOTLB)
May 31 10:13:11 harley kernel: f1ad1dec 00200082 00000002 f1ad1dd8 f1ad1dd4 00000000 c4846a90 0e11dd88
May 31 10:13:11 harley kernel: 00000001 c4846a90 dc68aa90 edfe16d9 00000298 00000d8b c4846bb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 f1ad1e14 c012fc27 00200202 f1ad1eac f1ad1e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S F0A07DD8 0 12656 12288 (NOTLB)
May 31 10:13:11 harley kernel: f0a07dec 00200082 00000002 f0a07dd8 f0a07dd4 00000000 dc68aa90 0e2c331d
May 31 10:13:11 harley kernel: 00000001 dc68aa90 dc68a560 edfe254f 00000298 00000e76 dc68abb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 f0a07e14 c012fc27 00200202 f0a07eac f0a07e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S F0293DD8 0 12657 12288 (NOTLB)
May 31 10:13:11 harley kernel: f0293dec 00200082 00000002 f0293dd8 f0293dd4 00000000 00000060 00200287
May 31 10:13:11 harley kernel: 00000001 dc68a560 f0f04a90 edfe32a3 00000298 00000d54 dc68a680 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 f0293e14 c012fc27 00200202 f0293eac f0293e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S E5C0FDD8 0 12658 12288 (NOTLB)
May 31 10:13:11 harley kernel: e5c0fdec 00200082 00000002 e5c0fdd8 e5c0fdd4 00000000 f0f04a90 0e5de043
May 31 10:13:11 harley kernel: 00000001 f0f04a90 dabeb030 edfe3fbd 00000298 00000d1a f0f04bb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 e5c0fe14 c012fc27 00200202 e5c0feac e5c0fe20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S F2FC3DD8 0 12659 12288 (NOTLB)
May 31 10:13:11 harley kernel: f2fc3dec 00200082 00000002 f2fc3dd8 f2fc3dd4 00000000 dabeb030 0e76131e
May 31 10:13:11 harley kernel: 00000001 dabeb030 dabeb560 edfe4d8c 00000298 00000dcf dabeb150 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 f2fc3e14 c012fc27 00200202 f2fc3eac f2fc3e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S E353BDD8 0 12660 12288 (NOTLB)
May 31 10:13:11 harley kernel: e353bdec 00200082 00000002 e353bdd8 e353bdd4 00000000 dabeb560 0e8d67f3
May 31 10:13:11 harley kernel: 00000001 dabeb560 c19f3560 edfe5a65 00000298 00000cd9 dabeb680 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 e353be14 c012fc27 00200202 e353beac e353be20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c014ee95>] find_extend_vma+0x12/0x49
May 31 10:13:11 harley kernel: [<c012fcb6>] get_futex_key+0x6e/0x126
May 31 10:13:11 harley kernel: [<c0130a00>] futex_wake+0xa9/0xb3
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S E5063DD8 0 12661 12288 (NOTLB)
May 31 10:13:11 harley kernel: e5063dec 00200082 00000002 e5063dd8 e5063dd4 00000000 00000000 00000000
May 31 10:13:11 harley kernel: 00000001 e77e8560 c032e360 0eac2c38 00000297 00002691 e77e8680 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 00097b02 ffffffff 00000000 00000000 e5063e20 c199f4b4
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c0130522>] futex_wait+0x1db/0x3bd
May 31 10:13:11 harley kernel: [<c02a97ae>] __sched_text_start+0x73e/0x7f0
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0130a79>] do_futex+0x6f/0xf4a
May 31 10:13:11 harley kernel: [<c02a998c>] preempt_schedule+0x3d/0x59
May 31 10:13:11 harley kernel: [<c0115235>] migrate_task+0x46/0x74
May 31 10:13:11 harley kernel: [<c0115fd6>] set_cpus_allowed+0x70/0x8e
May 31 10:13:11 harley kernel: [<c0131a1c>] sys_futex+0xc8/0xda
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: [<c02a0000>] clone_policy+0x42/0xe5
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vrsx S E6E5DB50 0 12662 12288 (NOTLB)
May 31 10:13:11 harley kernel: e6e5db64 00200082 00000002 e6e5db50 e6e5db4c 00000000 00000000 00000000
May 31 10:13:11 harley kernel: 00000001 e77e8a90 c032e360 fba335be 00000312 00003057 e77e8bb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 c199f480 000b83bb fffffffd 00000000 00000000 e6e5db74 000b8401
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c02aa015>] schedule_timeout+0x70/0x8d
May 31 10:13:11 harley kernel: [<c01211af>] process_timeout+0x0/0x5
May 31 10:13:11 harley kernel: [<c0167e1b>] do_select+0x393/0x3e0
May 31 10:13:11 harley kernel: [<c01683a8>] __pollwait+0x0/0xac
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0113ba9>] __activate_task+0x1c/0x29
May 31 10:13:11 harley kernel: [<c0115b1e>] try_to_wake_up+0x399/0x3a3
May 31 10:13:11 harley kernel: [<c01299c1>] autoremove_wake_function+0x15/0x35
May 31 10:13:11 harley kernel: [<c011377e>] __wake_up_common+0x32/0x55
May 31 10:13:11 harley kernel: [<c0113cca>] __wake_up+0x32/0x43
May 31 10:13:11 harley kernel: [<c0126ce0>] insert_work+0x61/0x65
May 31 10:13:11 harley kernel: [<c02a97ae>] __sched_text_start+0x73e/0x7f0
May 31 10:13:11 harley kernel: [<c02a97d5>] __sched_text_start+0x765/0x7f0
May 31 10:13:11 harley kernel: [<c0115b1e>] try_to_wake_up+0x399/0x3a3
May 31 10:13:11 harley kernel: [<c012bddd>] lock_hrtimer_base+0x15/0x2f
May 31 10:13:11 harley kernel: [<c016cefb>] iput+0x39/0x62
May 31 10:13:11 harley kernel: [<c012fc27>] unqueue_me+0x7c/0x84
May 31 10:13:11 harley kernel: [<c013067c>] futex_wait+0x335/0x3bd
May 31 10:13:11 harley kernel: [<c0113ba9>] __activate_task+0x1c/0x29
May 31 10:13:11 harley kernel: [<c0115b1e>] try_to_wake_up+0x399/0x3a3
May 31 10:13:11 harley kernel: [<c0168105>] core_sys_select+0x29d/0x2ba
May 31 10:13:11 harley kernel: [<c011377e>] __wake_up_common+0x32/0x55
May 31 10:13:11 harley kernel: [<c0113cca>] __wake_up+0x32/0x43
May 31 10:13:11 harley kernel: [<c012f678>] wake_futex+0x3b/0x45
May 31 10:13:11 harley kernel: [<c01318eb>] do_futex+0xee1/0xf4a
May 31 10:13:11 harley kernel: [<c02a9817>] __sched_text_start+0x7a7/0x7f0
May 31 10:13:11 harley kernel: [<c0113b01>] set_load_weight+0x4a/0x8e
May 31 10:13:11 harley kernel: [<c01684f8>] sys_select+0xa4/0x187
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: =======================
May 31 10:13:11 harley kernel: vi S DB833B50 0 12663 12276 (NOTLB)
May 31 10:13:11 harley kernel: db833b64 00200082 00000002 db833b50 db833b4c 00000000 00000000 f706b2e0
May 31 10:13:11 harley kernel: 00000006 e8c0ca90 c032e360 895791b6 000002ad 00005cff e8c0cbb0 c180aa80
May 31 10:13:11 harley kernel: 00000000 dfce7c40 0009d902 fffffffd 00000000 00000000 7fffffff d2aa0140
May 31 10:13:11 harley kernel: Call Trace:
May 31 10:13:11 harley kernel: [<c02a9fb8>] schedule_timeout+0x13/0x8d
May 31 10:13:11 harley kernel: [<c020942b>] tty_poll+0x56/0x63
May 31 10:13:11 harley kernel: [<c0167e1b>] do_select+0x393/0x3e0
May 31 10:13:11 harley kernel: [<c01683a8>] __pollwait+0x0/0xac
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<c0115b28>] default_wake_function+0x0/0xc
May 31 10:13:11 harley kernel: [<f8901d87>] do_get_write_access+0x58d/0x5ba [jbd]
May 31 10:13:11 harley kernel: [<f8917ab0>] __ext3_get_inode_loc+0x10f/0x2dd [ext3]
May 31 10:13:11 harley kernel: [<f8923196>] __ext3_journal_dirty_metadata+0x16/0x3a [ext3]
May 31 10:13:11 harley kernel: [<f8901dd5>] journal_get_write_access+0x21/0x26 [jbd]
May 31 10:13:11 harley kernel: [<f891793b>] ext3_mark_iloc_dirty+0x27f/0x2e5 [ext3]
May 31 10:13:11 harley kernel: [<f8917d0c>] ext3_mark_inode_dirty+0x20/0x27 [ext3]
May 31 10:13:11 harley kernel: [<c0113cca>] __wake_up+0x32/0x43
May 31 10:13:11 harley kernel: [<f89017ee>] journal_stop+0x294/0x2a0 [jbd]
May 31 10:13:11 harley kernel: [<f891ef33>] __ext3_journal_stop+0x19/0x34 [ext3]
May 31 10:13:11 harley kernel: [<f8919c14>] ext3_ordered_commit_write+0xba/0xd8 [ext3]
May 31 10:13:11 harley kernel: [<c01477df>] __pagevec_lru_add+0x91/0x9c
May 31 10:13:11 harley kernel: [<c0113d42>] task_rq_lock+0x31/0x58
May 31 10:13:11 harley kernel: [<c0115b1e>] try_to_wake_up+0x399/0x3a3
May 31 10:13:11 harley kernel: [<c02a97ae>] __sched_text_start+0x73e/0x7f0
May 31 10:13:11 harley kernel: [<c02a97d5>] __sched_text_start+0x765/0x7f0
May 31 10:13:11 harley kernel: [<c011377e>] __wake_up_common+0x32/0x55
May 31 10:13:11 harley kernel: [<c0113cca>] __wake_up+0x32/0x43
May 31 10:13:11 harley kernel: [<c0168105>] core_sys_select+0x29d/0x2ba
May 31 10:13:11 harley kernel: [<c0113ba9>] __activate_task+0x1c/0x29
May 31 10:13:11 harley kernel: [<c0115b1e>] try_to_wake_up+0x399/0x3a3
May 31 10:13:11 harley kernel: [<c011377e>] __wake_up_common+0x32/0x55
May 31 10:13:11 harley kernel: [<c0113cca>] __wake_up+0x32/0x43
May 31 10:13:11 harley kernel: [<c02085ac>] tty_wakeup+0x4c/0x50
May 31 10:13:11 harley kernel: [<c0113cca>] __wake_up+0x32/0x43
May 31 10:13:11 harley kernel: [<c0208509>] tty_ldisc_deref+0x55/0x64
May 31 10:13:11 harley kernel: [<c016852a>] sys_select+0xd6/0x187
May 31 10:13:11 harley kernel: [<c0103d3e>] sysenter_past_esp+0x5f/0x85
May 31 10:13:11 harley kernel: [<c02a0000>] clone_policy+0x42/0xe5
May 31 10:13:11 harley kernel: =======================


Attachments:
sysrq.txt (25.17 kB)

2007-05-31 17:06:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On 05/31, Mark Hounschell wrote:
>
> Andrew Morton wrote:
> > On Tue, 29 May 2007 13:31:05 -0400 Mark Hounschell <[email protected]> wrote:
> >
> >> Changes in floppy.c from 2.6.17 and 2.6.18 have broken an application I have. I have tracked
> >> it down to a single line of code. When the following patch is applied to the version in 2.6.18
> >> my application works.
> >>
> >> --- linux-2.6.18/drivers/block/floppy.c 2006-09-19 23:42:06.000000000 -0400
> >> +++ linux-2.6.18-crt/drivers/block/floppy.c 2007-05-29 09:12:20.000000000 -0400
> >> @@ -893,7 +893,6 @@
> >> set_current_state(TASK_RUNNING);
> >> remove_wait_queue(&fdc_wait, &wait);
> >>
> >> - flush_scheduled_work();
> >> }
> >> command_status = FD_COMMAND_NONE;
> >>
> > Interesting. I'd expect that the calling process is spinning, with realtime
> > policy and is expecting some other process to do something (ie: run a workqueue).
> >
> > If you keep the process and irq affinities, and disable the realtime policy
> > does that also prevent the problem?
> >
>
> Yes it does.
>
> > It would be interesting it you could capture a few task traces while it is stuck:
> > echo 1 > /proc/sys/kernel/sysrq then do ALT-SYSRQ-P a bunch of times and ALT-SYSRQ-T,
> > see if you can work out where the CPU is stuck.
> >
>
> I've attached the syslog output as a result of doing the above. I can't really make any kind of
> determination from it myself as I don't really knowing what I'm looking at.

Could you show the full output? There are no events/* or process doing ioctl()
in sysrq.txt you attached.

> > ALso, 2.6.22-rc3 might have accidentally fixed this.
> >
>
> No. Same thing there. The traces attached are using 2.6.22-rc3.
>
> Basically the main RT-process (which is a CPU bound process on processor-2) signals a
> thread to do some I/O. That RT-thread (running on the other processor) does a simple

If the main RT-process monopolizes processor-2, flush_workqueue() (or cancel_work_sync())
can hang of course, we can do nothing.

> ioctl(Q->DevSpec1, FDSETPRM, &medprm)
>
> and there is no return from the call. That thread is hung.

What happens if you kill the main RT-process?

Could you try the patch below? Just to see if it makes any difference.

Oleg.

(against 2.6.22-rcX)

--- OLD/drivers/block/floppy.c~ 2007-04-03 13:04:58.000000000 +0400
+++ OLD/drivers/block/floppy.c 2007-05-31 20:50:18.000000000 +0400
@@ -862,6 +862,8 @@ static void set_fdc(int drive)
FDCS->reset = 1;
}

+static DECLARE_WORK(floppy_work, NULL);
+
/* locks the driver */
static int _lock_fdc(int drive, int interruptible, int line)
{
@@ -893,7 +895,7 @@ static int _lock_fdc(int drive, int inte
set_current_state(TASK_RUNNING);
remove_wait_queue(&fdc_wait, &wait);

- flush_scheduled_work();
+ cancel_work_sync(&floppy_work);
}
command_status = FD_COMMAND_NONE;

@@ -992,8 +994,6 @@ static void empty(void)
{
}

-static DECLARE_WORK(floppy_work, NULL);
-
static void schedule_bh(void (*handler) (void))
{
PREPARE_WORK(&floppy_work, (work_func_t)handler);

2007-05-31 18:00:57

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> On 05/31, Mark Hounschell wrote:
>> Andrew Morton wrote:
>>> On Tue, 29 May 2007 13:31:05 -0400 Mark Hounschell <[email protected]> wrote:
>>>
>>>> Changes in floppy.c from 2.6.17 and 2.6.18 have broken an application I have. I have tracked
>>>> it down to a single line of code. When the following patch is applied to the version in 2.6.18
>>>> my application works.
>>>>
>>>> --- linux-2.6.18/drivers/block/floppy.c 2006-09-19 23:42:06.000000000 -0400
>>>> +++ linux-2.6.18-crt/drivers/block/floppy.c 2007-05-29 09:12:20.000000000 -0400
>>>> @@ -893,7 +893,6 @@
>>>> set_current_state(TASK_RUNNING);
>>>> remove_wait_queue(&fdc_wait, &wait);
>>>>
>>>> - flush_scheduled_work();
>>>> }
>>>> command_status = FD_COMMAND_NONE;
>>>>
>>> Interesting. I'd expect that the calling process is spinning, with realtime
>>> policy and is expecting some other process to do something (ie: run a workqueue).
>>>
>>> If you keep the process and irq affinities, and disable the realtime policy
>>> does that also prevent the problem?
>>>
>> Yes it does.
>>
>>> It would be interesting it you could capture a few task traces while it is stuck:
>>> echo 1 > /proc/sys/kernel/sysrq then do ALT-SYSRQ-P a bunch of times and ALT-SYSRQ-T,
>>> see if you can work out where the CPU is stuck.
>>>
>> I've attached the syslog output as a result of doing the above. I can't really make any kind of
>> determination from it myself as I don't really knowing what I'm looking at.
>
> Could you show the full output? There are no events/* or process doing ioctl()
> in sysrq.txt you attached.
>
>>> ALso, 2.6.22-rc3 might have accidentally fixed this.
>>>
>> No. Same thing there. The traces attached are using 2.6.22-rc3.
>>
>> Basically the main RT-process (which is a CPU bound process on processor-2) signals a
>> thread to do some I/O. That RT-thread (running on the other processor) does a simple
>
> If the main RT-process monopolizes processor-2, flush_workqueue() (or cancel_work_sync())
> can hang of course, we can do nothing.
>
>> ioctl(Q->DevSpec1, FDSETPRM, &medprm)
>>
>> and there is no return from the call. That thread is hung.
>
> What happens if you kill the main RT-process?
>
> Could you try the patch below? Just to see if it makes any difference.
>
> Oleg.
>
> (against 2.6.22-rcX)
>
> --- OLD/drivers/block/floppy.c~ 2007-04-03 13:04:58.000000000 +0400
> +++ OLD/drivers/block/floppy.c 2007-05-31 20:50:18.000000000 +0400
> @@ -862,6 +862,8 @@ static void set_fdc(int drive)
> FDCS->reset = 1;
> }
>
> +static DECLARE_WORK(floppy_work, NULL);
> +
> /* locks the driver */
> static int _lock_fdc(int drive, int interruptible, int line)
> {
> @@ -893,7 +895,7 @@ static int _lock_fdc(int drive, int inte
> set_current_state(TASK_RUNNING);
> remove_wait_queue(&fdc_wait, &wait);
>
> - flush_scheduled_work();
> + cancel_work_sync(&floppy_work);
> }
> command_status = FD_COMMAND_NONE;
>
> @@ -992,8 +994,6 @@ static void empty(void)
> {
> }
>
> -static DECLARE_WORK(floppy_work, NULL);
> -
> static void schedule_bh(void (*handler) (void))
> {
> PREPARE_WORK(&floppy_work, (work_func_t)handler);
>

The patch does make it work. Would you like for me to try again to get a trace with
something meaningful in it?

Regards
Mark

2007-05-31 18:44:46

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> On 05/31, Mark Hounschell wrote:
>> Andrew Morton wrote:
>>> On Tue, 29 May 2007 13:31:05 -0400 Mark Hounschell <[email protected]> wrote:
>>>
>>>> Changes in floppy.c from 2.6.17 and 2.6.18 have broken an application I have. I have tracked
>>>> it down to a single line of code. When the following patch is applied to the version in 2.6.18
>>>> my application works.
>>>>
>>>> --- linux-2.6.18/drivers/block/floppy.c 2006-09-19 23:42:06.000000000 -0400
>>>> +++ linux-2.6.18-crt/drivers/block/floppy.c 2007-05-29 09:12:20.000000000 -0400
>>>> @@ -893,7 +893,6 @@
>>>> set_current_state(TASK_RUNNING);
>>>> remove_wait_queue(&fdc_wait, &wait);
>>>>
>>>> - flush_scheduled_work();
>>>> }
>>>> command_status = FD_COMMAND_NONE;
>>>>
>>> Interesting. I'd expect that the calling process is spinning, with realtime
>>> policy and is expecting some other process to do something (ie: run a workqueue).
>>>
>>> If you keep the process and irq affinities, and disable the realtime policy
>>> does that also prevent the problem?
>>>
>> Yes it does.
>>
>>> It would be interesting it you could capture a few task traces while it is stuck:
>>> echo 1 > /proc/sys/kernel/sysrq then do ALT-SYSRQ-P a bunch of times and ALT-SYSRQ-T,
>>> see if you can work out where the CPU is stuck.
>>>
>> I've attached the syslog output as a result of doing the above. I can't really make any kind of
>> determination from it myself as I don't really knowing what I'm looking at.
>
> Could you show the full output? There are no events/* or process doing ioctl()
> in sysrq.txt you attached.
>
>>> ALso, 2.6.22-rc3 might have accidentally fixed this.
>>>
>> No. Same thing there. The traces attached are using 2.6.22-rc3.
>>
>> Basically the main RT-process (which is a CPU bound process on processor-2) signals a
>> thread to do some I/O. That RT-thread (running on the other processor) does a simple
>
> If the main RT-process monopolizes processor-2, flush_workqueue() (or cancel_work_sync())
> can hang of course, we can do nothing.
>
>> ioctl(Q->DevSpec1, FDSETPRM, &medprm)
>>
>> and there is no return from the call. That thread is hung.
>
> What happens if you kill the main RT-process?
>

When I kill the main process all its threads also go away. Including the floppy thread.
Nothing notable happens with this kernel. On previous (2.6.18) I would get a dump
from the floppy driver in the syslog when I killed the process.

> Could you try the patch below? Just to see if it makes any difference.
>
> Oleg.
>
> (against 2.6.22-rcX)
>
> --- OLD/drivers/block/floppy.c~ 2007-04-03 13:04:58.000000000 +0400
> +++ OLD/drivers/block/floppy.c 2007-05-31 20:50:18.000000000 +0400
> @@ -862,6 +862,8 @@ static void set_fdc(int drive)
> FDCS->reset = 1;
> }
>
> +static DECLARE_WORK(floppy_work, NULL);
> +
> /* locks the driver */
> static int _lock_fdc(int drive, int interruptible, int line)
> {
> @@ -893,7 +895,7 @@ static int _lock_fdc(int drive, int inte
> set_current_state(TASK_RUNNING);
> remove_wait_queue(&fdc_wait, &wait);
>
> - flush_scheduled_work();
> + cancel_work_sync(&floppy_work);
> }
> command_status = FD_COMMAND_NONE;
>
> @@ -992,8 +994,6 @@ static void empty(void)
> {
> }
>
> -static DECLARE_WORK(floppy_work, NULL);
> -
> static void schedule_bh(void (*handler) (void))
> {
> PREPARE_WORK(&floppy_work, (work_func_t)handler);
>

The patch does make it work.

Regards
Mark

2007-05-31 19:22:58

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On 05/31, Mark Hounschell wrote:
>
> Oleg Nesterov wrote:
> > On 05/31, Mark Hounschell wrote:
> >>
> >> Basically the main RT-process (which is a CPU bound process on processor-2) signals a
> >> thread to do some I/O. That RT-thread (running on the other processor) does a simple
> >
> > If the main RT-process monopolizes processor-2, flush_workqueue() (or cancel_work_sync())
> > can hang of course, we can do nothing.
> >
> >> ioctl(Q->DevSpec1, FDSETPRM, &medprm)
> >>
> >> and there is no return from the call. That thread is hung.
> >
> > What happens if you kill the main RT-process?
> >
>
> When I kill the main process all its threads also go away. Including the floppy thread.
> Nothing notable happens with this kernel.

Aha, I missed the word "thread", this is the single process.

Still, this means that flush_workqueue() completes when other sub-threads go away,
otherwise the thread doing ioctl() couldn't exit.

Thank you very much.

So, the main question is: is it possible that one of RT processes/threads pins itself
to some CPU and eats 100% cpu power?

> On previous (2.6.18) I would get a dump
> from the floppy driver in the syslog when I killed the process.

Could you send me this output? just in case...

> > --- OLD/drivers/block/floppy.c~ 2007-04-03 13:04:58.000000000 +0400
> > +++ OLD/drivers/block/floppy.c 2007-05-31 20:50:18.000000000 +0400
> > @@ -862,6 +862,8 @@ static void set_fdc(int drive)
> > FDCS->reset = 1;
> > }
> >
> > +static DECLARE_WORK(floppy_work, NULL);
> > +
> > /* locks the driver */
> > static int _lock_fdc(int drive, int interruptible, int line)
> > {
> > @@ -893,7 +895,7 @@ static int _lock_fdc(int drive, int inte
> > set_current_state(TASK_RUNNING);
> > remove_wait_queue(&fdc_wait, &wait);
> >
> > - flush_scheduled_work();
> > + cancel_work_sync(&floppy_work);
> > }
> > command_status = FD_COMMAND_NONE;
> >
> > @@ -992,8 +994,6 @@ static void empty(void)
> > {
> > }
> >
> > -static DECLARE_WORK(floppy_work, NULL);
> > -
> > static void schedule_bh(void (*handler) (void))
> > {
> > PREPARE_WORK(&floppy_work, (work_func_t)handler);
> >
>
> The patch does make it work.

I do not understand floppy.c, absolutely, so I am not sure this patch is correct.

Even if correct, this patch doesn't solve this problem (if we really understand
what's going on). cancel_work_sync() may still hang if floppy_work->func() runs
on the starved CPU. This is unlikely, but possible.

Thanks!

Oleg.

2007-05-31 20:18:29

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> On 05/31, Mark Hounschell wrote:
>> Oleg Nesterov wrote:
>>> On 05/31, Mark Hounschell wrote:
>>>> Basically the main RT-process (which is a CPU bound process on processor-2) signals a
>>>> thread to do some I/O. That RT-thread (running on the other processor) does a simple
>>> If the main RT-process monopolizes processor-2, flush_workqueue() (or cancel_work_sync())
>>> can hang of course, we can do nothing.
>>>
>>>> ioctl(Q->DevSpec1, FDSETPRM, &medprm)
>>>>
>>>> and there is no return from the call. That thread is hung.
>>> What happens if you kill the main RT-process?
>>>
>> When I kill the main process all its threads also go away. Including the floppy thread.
>> Nothing notable happens with this kernel.
>
> Aha, I missed the word "thread", this is the single process.
>
> Still, this means that flush_workqueue() completes when other sub-threads go away,
> otherwise the thread doing ioctl() couldn't exit.
>
> Thank you very much.
>
> So, the main question is: is it possible that one of RT processes/threads pins itself
> to some CPU and eats 100% cpu power?
>

The main process is pinned to a processor(2) with all _non-kernel_ processes/threads forced over to processor 1.
Any already affinitized processes or kernel threads are left as is. Only user land stuff is moved. The main process
is for sure _not_ relinquishing it's processor(2) intentionally. All the I/O threads, floppy included, are running
on the other processor(1). During this failure only 1 or 2 of the I/O threads are actually doing anything.
I assume that what ever is going on in the kernel/floppy driver on behalf of the floppy thread is being done on processor 1?
Processor 1 has lots of CPU time available. Processor 2 is running balls to the wall.

>> On previous (2.6.18) I would get a dump
>> from the floppy driver in the syslog when I killed the process.
>
> Could you send me this output? just in case...
>

Today, 2.6.18 is doing the same as 2.6.22-rc3. I hate it when that happens. Maybe it was
on my box at home. I'll verify when I get there. Nothing from here now though.

>>> --- OLD/drivers/block/floppy.c~ 2007-04-03 13:04:58.000000000 +0400
>>> +++ OLD/drivers/block/floppy.c 2007-05-31 20:50:18.000000000 +0400
>>> @@ -862,6 +862,8 @@ static void set_fdc(int drive)
>>> FDCS->reset = 1;
>>> }
>>>
>>> +static DECLARE_WORK(floppy_work, NULL);
>>> +
>>> /* locks the driver */
>>> static int _lock_fdc(int drive, int interruptible, int line)
>>> {
>>> @@ -893,7 +895,7 @@ static int _lock_fdc(int drive, int inte
>>> set_current_state(TASK_RUNNING);
>>> remove_wait_queue(&fdc_wait, &wait);
>>>
>>> - flush_scheduled_work();
>>> + cancel_work_sync(&floppy_work);
>>> }
>>> command_status = FD_COMMAND_NONE;
>>>
>>> @@ -992,8 +994,6 @@ static void empty(void)
>>> {
>>> }
>>>
>>> -static DECLARE_WORK(floppy_work, NULL);
>>> -
>>> static void schedule_bh(void (*handler) (void))
>>> {
>>> PREPARE_WORK(&floppy_work, (work_func_t)handler);
>>>
>> The patch does make it work.
>
> I do not understand floppy.c, absolutely, so I am not sure this patch is correct.
>
> Even if correct, this patch doesn't solve this problem (if we really understand
> what's going on). cancel_work_sync() may still hang if floppy_work->func() runs
> on the starved CPU. This is unlikely, but possible.
>
> Thanks!
>
> Oleg.
>

2007-06-01 09:51:56

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Mark Hounschell wrote:
> Oleg Nesterov wrote:
>> On 05/31, Mark Hounschell wrote:
>>> Oleg Nesterov wrote:
>>>> On 05/31, Mark Hounschell wrote:
>>>>> Basically the main RT-process (which is a CPU bound process on processor-2) signals a
>>>>> thread to do some I/O. That RT-thread (running on the other processor) does a simple
>>>> If the main RT-process monopolizes processor-2, flush_workqueue() (or cancel_work_sync())
>>>> can hang of course, we can do nothing.
>>>>
>>>>> ioctl(Q->DevSpec1, FDSETPRM, &medprm)
>>>>>
>>>>> and there is no return from the call. That thread is hung.
>>>> What happens if you kill the main RT-process?
>>>>
>>> When I kill the main process all its threads also go away. Including the floppy thread.
>>> Nothing notable happens with this kernel.
>> Aha, I missed the word "thread", this is the single process.
>>
>> Still, this means that flush_workqueue() completes when other sub-threads go away,
>> otherwise the thread doing ioctl() couldn't exit.
>>
>> Thank you very much.
>>
>> So, the main question is: is it possible that one of RT processes/threads pins itself
>> to some CPU and eats 100% cpu power?
>>
>
> The main process is pinned to a processor(2) with all _non-kernel_ processes/threads forced over to processor 1.
> Any already affinitized processes or kernel threads are left as is. Only user land stuff is moved. The main process
> is for sure _not_ relinquishing it's processor(2) intentionally. All the I/O threads, floppy included, are running
> on the other processor(1). During this failure only 1 or 2 of the I/O threads are actually doing anything.
> I assume that what ever is going on in the kernel/floppy driver on behalf of the floppy thread is being done on processor 1?
> Processor 1 has lots of CPU time available. Processor 2 is running balls to the wall.
>
>>> On previous (2.6.18) I would get a dump
>>> from the floppy driver in the syslog when I killed the process.
>> Could you send me this output? just in case...
>>
>
> Today, 2.6.18 is doing the same as 2.6.22-rc3. I hate it when that happens. Maybe it was
> on my box at home. I'll verify when I get there. Nothing from here now though.
>

Those syslog dumps must have been a result of something I was doing
while trying to pinpoint my problem. I do not get these now. Sorry.

>>>> --- OLD/drivers/block/floppy.c~ 2007-04-03 13:04:58.000000000 +0400
>>>> +++ OLD/drivers/block/floppy.c 2007-05-31 20:50:18.000000000 +0400
>>>> @@ -862,6 +862,8 @@ static void set_fdc(int drive)
>>>> FDCS->reset = 1;
>>>> }
>>>>
>>>> +static DECLARE_WORK(floppy_work, NULL);
>>>> +
>>>> /* locks the driver */
>>>> static int _lock_fdc(int drive, int interruptible, int line)
>>>> {
>>>> @@ -893,7 +895,7 @@ static int _lock_fdc(int drive, int inte
>>>> set_current_state(TASK_RUNNING);
>>>> remove_wait_queue(&fdc_wait, &wait);
>>>>
>>>> - flush_scheduled_work();
>>>> + cancel_work_sync(&floppy_work);
>>>> }
>>>> command_status = FD_COMMAND_NONE;
>>>>
>>>> @@ -992,8 +994,6 @@ static void empty(void)
>>>> {
>>>> }
>>>>
>>>> -static DECLARE_WORK(floppy_work, NULL);
>>>> -
>>>> static void schedule_bh(void (*handler) (void))
>>>> {
>>>> PREPARE_WORK(&floppy_work, (work_func_t)handler);
>>>>
>>> The patch does make it work.
>> I do not understand floppy.c, absolutely, so I am not sure this patch is correct.
>>
>> Even if correct, this patch doesn't solve this problem (if we really understand
>> what's going on). cancel_work_sync() may still hang if floppy_work->func() runs
>> on the starved CPU. This is unlikely, but possible.
>>
>> Thanks!
>>
>> Oleg.
>>

Thanks and Regards
Mark

2007-06-01 11:01:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

I hope Ingo will correct me if I am wrong,

On 05/31, Mark Hounschell wrote:
>
> Oleg Nesterov wrote:
> >
> > So, the main question is: is it possible that one of RT processes/threads pins itself
> > to some CPU and eats 100% cpu power?
> >
>
> The main process is pinned to a processor(2) with all _non-kernel_ processes/threads forced over to processor 1.
> Any already affinitized processes or kernel threads are left as is. Only user land stuff is moved. The main process
> is for sure _not_ relinquishing it's processor(2) intentionally.

This means that a non-rt kernel thread bound to CPU 2 can't run. In particular,
events/2. This means that the problem is not directly connected to floppy.c,
any flush_scheduled_work() (or schedule_on_each_cpu()) can't succeed.

You can change irq/X/smp_affinity, but smp_apic_timer_interrupt() still can
queue work_struct on CPU 2 (for example, mm/slab.c uses per-cpu reap_work).
Since events/2 is blocked by the main RT thread, such a work_struct can't be
executed, and so flush_scheduled_work() hangs.

> All the I/O threads, floppy included, are running
> on the other processor(1). During this failure only 1 or 2 of the I/O threads are actually doing anything.
> I assume that what ever is going on in the kernel/floppy driver on behalf of the floppy thread is being done on processor 1?
> Processor 1 has lots of CPU time available.

Yes, but see above. flush_scheduled_work() needs a cooperation from events/2
which is bound to CPU 2.

If you changed irq/X/smp_affinity, the patch I sent should help, because
floppy_work can't be scheduled on CPU 2, but still I don't think it is right
to run 100% cpu-bound RT-process.

Oleg.

2007-06-01 14:10:35

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> I hope Ingo will correct me if I am wrong,
>
> On 05/31, Mark Hounschell wrote:
>> Oleg Nesterov wrote:
>>> So, the main question is: is it possible that one of RT processes/threads pins itself
>>> to some CPU and eats 100% cpu power?
>>>
>> The main process is pinned to a processor(2) with all _non-kernel_ processes/threads forced over to processor 1.
>> Any already affinitized processes or kernel threads are left as is. Only user land stuff is moved. The main process
>> is for sure _not_ relinquishing it's processor(2) intentionally.
>
> This means that a non-rt kernel thread bound to CPU 2 can't run. In particular,
> events/2. This means that the problem is not directly connected to floppy.c,
> any flush_scheduled_work() (or schedule_on_each_cpu()) can't succeed.
>

Well, I have multiple I/O threads for many other types of I/O that don't have any problems.
And until these changes in 2.6.18 I didn't have any problems with the floppy. I have multiple
ethernet threads, multiple scsi (SG) device threads, multiple rs232 device threads, parallel port,
and others, no problem??


> You can change irq/X/smp_affinity, but smp_apic_timer_interrupt() still can
> queue work_struct on CPU 2 (for example, mm/slab.c uses per-cpu reap_work).
> Since events/2 is blocked by the main RT thread, such a work_struct can't be
> executed, and so flush_scheduled_work() hangs.
>

I don't mean to sound stupid but why would a process running on processor 1 require anything
from events/2 when there is an events/1? Forgive my ignorance please.

>> All the I/O threads, floppy included, are running
>> on the other processor(1). During this failure only 1 or 2 of the I/O threads are actually doing anything.
>> I assume that what ever is going on in the kernel/floppy driver on behalf of the floppy thread is being done on processor 1?
>> Processor 1 has lots of CPU time available.
>
> Yes, but see above. flush_scheduled_work() needs a cooperation from events/2
> which is bound to CPU 2.
>

Again I don't understand why flush_scheduled_work() running on behalf of a process
affinitized to processor-1 requires cooperation from events/2 (affinitized to processor-2)
when there is an events/1 already affinitized to processor 1? Again though, Forgive my
ignorance please.

> If you changed irq/X/smp_affinity, the patch I sent should help, because
> floppy_work can't be scheduled on CPU 2, but still I don't think it is right
> to run 100% cpu-bound RT-process.
>
> Oleg.
>

The patch you sent helps with no other intervention from me. But then so does
the patch mentioned in the original post. I am able to bang on the floppies pretty
hard doing all kinds of things with no trouble using either.

As far as a 100% cpu-bound RT-process goes, well I say I don't intentionally relinquish
the processor but it's not really 100% cpu-bound. Running xosview I see some spare time.

Thanks
Mark

2007-06-01 15:16:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On 06/01, Mark Hounschell wrote:
>
> Oleg Nesterov wrote:
> > Yes, but see above. flush_scheduled_work() needs a cooperation from events/2
> > which is bound to CPU 2.
> >
>
> Again I don't understand why flush_scheduled_work() running on behalf of a process
> affinitized to processor-1 requires cooperation from events/2 (affinitized to processor-2)
> when there is an events/1 already affinitized to processor 1?

flush_workqueue() blocks until any scheduled work on any CPU has run to
completion. If we have some work_struct pending on CPU 2, it can be completed
only when events/2 executes it.

> > If you changed irq/X/smp_affinity, the patch I sent should help, because
> > floppy_work can't be scheduled on CPU 2, but still I don't think it is right
> > to run 100% cpu-bound RT-process.
>
> The patch you sent helps with no other intervention from me. But then so does
> the patch mentioned in the original post. I am able to bang on the floppies pretty
> hard doing all kinds of things with no trouble using either.

This patch replaces flush_scheduled_work() with cancel_work_sync(). The latter
can still hang if the floppy interrupt happens on CPU 2 and does schedule_bh(),
events/2 starts running floppy_work->func() and preempted by RT-thread. This is
very unlikely, but possible.

>From your original post:
>
> The application only runs on SMP machines and uses process and irq affinities

if the floppy interrupt can't happen on CPU 2, the above scenario is not possible.

> As far as a 100% cpu-bound RT-process goes, well I say I don't intentionally relinquish
> the processor but it's not really 100% cpu-bound. Running xosview I see some spare time.

Well, I don't know what is xosview, sorry :) so I don't understand what does
"spare time" precisely mean. If this thread does some i/o or something which
can sleep, then...

OK. In that case we may have another reason for deadlock, say a pending
floppy_work needs open_lock or test_and_set_bit(0, &fdc_busy).

Could you apply the trivial patch below, and change the i/o thread to do

prctl(1234); // hangs ???
printf(something);
ioctl(Q->DevSpec1, FDSETPRM, &medprm); // this hangs

to see if prctl() hangs or not? This way we can narrow the problem.
(of course, you can just kill the above ioctl() if this is possible).

Thanks!

Oleg.

--- OLD/kernel/sys.c~ 2007-04-03 13:05:02.000000000 +0400
+++ OLD/kernel/sys.c 2007-06-01 18:56:22.000000000 +0400
@@ -2147,6 +2147,11 @@ asmlinkage long sys_prctl(int option, un
{
long error;

+ if (option == 1234) {
+ flush_scheduled_work();
+ return 0;
+ }
+
error = security_task_prctl(option, arg2, arg3, arg4, arg5);
if (error)
return error;

2007-06-01 17:12:16

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> On 06/01, Mark Hounschell wrote:
>> Oleg Nesterov wrote:
>>> Yes, but see above. flush_scheduled_work() needs a cooperation from events/2
>>> which is bound to CPU 2.
>>>
>> Again I don't understand why flush_scheduled_work() running on behalf of a process
>> affinitized to processor-1 requires cooperation from events/2 (affinitized to processor-2)
>> when there is an events/1 already affinitized to processor 1?
>
> flush_workqueue() blocks until any scheduled work on any CPU has run to
> completion. If we have some work_struct pending on CPU 2, it can be completed
> only when events/2 executes it.
>
>>> If you changed irq/X/smp_affinity, the patch I sent should help, because
>>> floppy_work can't be scheduled on CPU 2, but still I don't think it is right
>>> to run 100% cpu-bound RT-process.
>> The patch you sent helps with no other intervention from me. But then so does
>> the patch mentioned in the original post. I am able to bang on the floppies pretty
>> hard doing all kinds of things with no trouble using either.
>
> This patch replaces flush_scheduled_work() with cancel_work_sync(). The latter
> can still hang if the floppy interrupt happens on CPU 2 and does schedule_bh(),
> events/2 starts running floppy_work->func() and preempted by RT-thread. This is
> very unlikely, but possible.
>
>>From your original post:
> >
> > The application only runs on SMP machines and uses process and irq affinities
>
> if the floppy interrupt can't happen on CPU 2, the above scenario is not possible.
>

All the irq affinities but one are set to processor-1. The only irq not
is from an rtom (Real-Time Option Module). It's irq is handled by
processor-2.

>> As far as a 100% cpu-bound RT-process goes, well I say I don't intentionally relinquish
>> the processor but it's not really 100% cpu-bound. Running xosview I see some spare time.
>
> Well, I don't know what is xosview, sorry :) so I don't understand what does
> "spare time" precisely mean. If this thread does some i/o or something which
> can sleep, then...
>

I don't understand the _real_ meaning of spare time either but xosview
is just a little graphical window showing information obtained from the
proc fs i think.

> OK. In that case we may have another reason for deadlock, say a pending
> floppy_work needs open_lock or test_and_set_bit(0, &fdc_busy).
>
> Could you apply the trivial patch below, and change the i/o thread to do
>
> prctl(1234); // hangs ???
> printf(something);
> ioctl(Q->DevSpec1, FDSETPRM, &medprm); // this hangs
>
> to see if prctl() hangs or not? This way we can narrow the problem.
> (of course, you can just kill the above ioctl() if this is possible).
>
> Thanks!
>
> Oleg.
>
> --- OLD/kernel/sys.c~ 2007-04-03 13:05:02.000000000 +0400
> +++ OLD/kernel/sys.c 2007-06-01 18:56:22.000000000 +0400
> @@ -2147,6 +2147,11 @@ asmlinkage long sys_prctl(int option, un
> {
> long error;
>
> + if (option == 1234) {
> + flush_scheduled_work();
> + return 0;
> + }
> +
> error = security_task_prctl(option, arg2, arg3, arg4, arg5);
> if (error)
> return error;
>
> -


Ok the prctl never returned. I just replaced the ioctl with it and added
a printf before and after. I only get the one before. The thread is hung
at this point just as if I'd done the ioctl?

Regards
Mark

2007-06-01 18:36:45

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On 06/01, Mark Hounschell wrote:
>
> Oleg Nesterov wrote:
> >
> > Could you apply the trivial patch below, and change the i/o thread to do
> >
> > prctl(1234); // hangs ???
> > printf(something);
> > ioctl(Q->DevSpec1, FDSETPRM, &medprm); // this hangs
> >
> > to see if prctl() hangs or not? This way we can narrow the problem.
> > (of course, you can just kill the above ioctl() if this is possible).
> >
> > Thanks!
> >
> > Oleg.
> >
> > --- OLD/kernel/sys.c~ 2007-04-03 13:05:02.000000000 +0400
> > +++ OLD/kernel/sys.c 2007-06-01 18:56:22.000000000 +0400
> > @@ -2147,6 +2147,11 @@ asmlinkage long sys_prctl(int option, un
> > {
> > long error;
> >
> > + if (option == 1234) {
> > + flush_scheduled_work();
> > + return 0;
> > + }
> > +
> > error = security_task_prctl(option, arg2, arg3, arg4, arg5);
> > if (error)
> > return error;
> >
> > -
>
>
> Ok the prctl never returned. I just replaced the ioctl with it and added
> a printf before and after. I only get the one before. The thread is hung
> at this point just as if I'd done the ioctl?

Thanks. So we can rule out floppy.c. flush_scheduled_work/flush_workqueue
is broken by this RT application. Imho, this is not the kernel problem.

Now I am very sure that the initial suspect was correct: cpu starvation.
I can cook a debug patch to be 100% sure tomorrow, which kernel version is
most convenient to you?

Oleg.

2007-06-01 19:52:57

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> On 06/01, Mark Hounschell wrote:
>> Oleg Nesterov wrote:
>>> Could you apply the trivial patch below, and change the i/o thread to do
>>>
>>> prctl(1234); // hangs ???
>>> printf(something);
>>> ioctl(Q->DevSpec1, FDSETPRM, &medprm); // this hangs
>>>
>>> to see if prctl() hangs or not? This way we can narrow the problem.
>>> (of course, you can just kill the above ioctl() if this is possible).
>>>
>>> Thanks!
>>>
>>> Oleg.
>>>
>>> --- OLD/kernel/sys.c~ 2007-04-03 13:05:02.000000000 +0400
>>> +++ OLD/kernel/sys.c 2007-06-01 18:56:22.000000000 +0400
>>> @@ -2147,6 +2147,11 @@ asmlinkage long sys_prctl(int option, un
>>> {
>>> long error;
>>>
>>> + if (option == 1234) {
>>> + flush_scheduled_work();
>>> + return 0;
>>> + }
>>> +
>>> error = security_task_prctl(option, arg2, arg3, arg4, arg5);
>>> if (error)
>>> return error;
>>>
>>> -
>>
>> Ok the prctl never returned. I just replaced the ioctl with it and added
>> a printf before and after. I only get the one before. The thread is hung
>> at this point just as if I'd done the ioctl?
>
> Thanks. So we can rule out floppy.c. flush_scheduled_work/flush_workqueue
> is broken by this RT application. Imho, this is not the kernel problem.
>
> Now I am very sure that the initial suspect was correct: cpu starvation.
> I can cook a debug patch to be 100% sure tomorrow, which kernel version is
> most convenient to you?
>

2.6.22-rc3 is fine thanks.

Regards
Mark

2007-06-02 12:30:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On 06/01, Mark Hounschell wrote:
>
> Oleg Nesterov wrote:
> > On 06/01, Mark Hounschell wrote:
> >>
> >> Ok the prctl never returned. I just replaced the ioctl with it and added
> >> a printf before and after. I only get the one before. The thread is hung
> >> at this point just as if I'd done the ioctl?
> >
> > Thanks. So we can rule out floppy.c. flush_scheduled_work/flush_workqueue
> > is broken by this RT application. Imho, this is not the kernel problem.
> >
> > Now I am very sure that the initial suspect was correct: cpu starvation.
> > I can cook a debug patch to be 100% sure tomorrow, which kernel version is
> > most convenient to you?
> >
>
> 2.6.22-rc3 is fine thanks.

Please try this patch, it should dump some debug info when flush_workqueue()
hangs (after 30 seconds). You can use it with or without the previous patch
I sent. Please wait for a couple of minutes to collect more info.

Oleg.

--- OLD/kernel/sched.c~TST 2007-04-05 12:20:35.000000000 +0400
+++ OLD/kernel/sched.c 2007-06-02 15:41:53.000000000 +0400
@@ -4177,6 +4177,20 @@ struct task_struct *idle_task(int cpu)
return cpu_rq(cpu)->idle;
}

+struct task_struct *get_cpu_curr(int cpu)
+{
+ unsigned long flags;
+ struct task_struct *curr;
+ struct rq *rq = cpu_rq(cpu);
+
+ spin_lock_irqsave(&rq->lock, flags);
+ curr = rq->curr;
+ get_task_struct(curr);
+ spin_unlock_irqrestore(&rq->lock, flags);
+
+ return curr;
+}
+
/**
* find_process_by_pid - find a process with a matching PID value.
* @pid: the pid in question.
--- OLD/kernel/workqueue.c~TST 2007-06-02 13:34:57.000000000 +0400
+++ OLD/kernel/workqueue.c 2007-06-02 16:18:02.000000000 +0400
@@ -49,6 +49,7 @@ struct cpu_workqueue_struct {
struct task_struct *thread;

int run_depth; /* Detect run_workqueue() recursion depth */
+ int jobs;
} ____cacheline_aligned;

/*
@@ -253,6 +254,7 @@ static void run_workqueue(struct cpu_wor

cwq->current_work = work;
list_del_init(cwq->worklist.next);
+ cwq->jobs++;
spin_unlock_irq(&cwq->lock);

BUG_ON(get_wq_data(work) != cwq);
@@ -328,7 +330,48 @@ static void insert_wq_barrier(struct cpu
insert_work(cwq, &barr->work, tail);
}

-static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
+extern struct task_struct *get_cpu_curr(int cpu);
+
+static void flush_wait(struct cpu_workqueue_struct *cwq, int cpu, struct completion *done)
+{
+ struct task_struct *curr;
+ struct work_struct *work;
+ int old_pid, jobs;
+
+ if (is_single_threaded(cwq->wq))
+ cpu = raw_smp_processor_id();
+
+again:
+ work = cwq->current_work;
+ jobs = cwq->jobs;
+
+ curr = get_cpu_curr(cpu);
+ old_pid = curr->pid;
+ put_task_struct(curr);
+
+ if (wait_for_completion_timeout(done, HZ * 30))
+ return;
+
+ printk(KERN_ERR "ERR!! %s flush hang: %p %p %d %d\n", cwq->thread->comm,
+ work, cwq->current_work, jobs, cwq->jobs);
+
+ curr = get_cpu_curr(cpu);
+ printk(KERN_ERR "CURR: %d %d %s %ld %ld\n", old_pid, curr->pid,
+ curr->comm, curr->nivcsw, curr->nvcsw);
+ put_task_struct(curr);
+
+ spin_lock_irq(&cwq->lock);
+ list_for_each_entry(work, &cwq->worklist, entry)
+ print_symbol(" %s\n", (unsigned long) work->func);
+ printk(" ----\n");
+ if (cwq->current_work)
+ print_symbol(" %s\n", (unsigned long) cwq->current_work->func);
+ spin_unlock_irq(&cwq->lock);
+
+ goto again;
+}
+
+static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq, int cpu)
{
int active;

@@ -351,7 +394,7 @@ static int flush_cpu_workqueue(struct cp
spin_unlock_irq(&cwq->lock);

if (active)
- wait_for_completion(&barr.done);
+ flush_wait(cwq, cpu, &barr.done);
}

return active;
@@ -377,7 +420,7 @@ void fastcall flush_workqueue(struct wor

might_sleep();
for_each_cpu_mask(cpu, *cpu_map)
- flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+ flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu), cpu);
}
EXPORT_SYMBOL_GPL(flush_workqueue);

@@ -748,7 +791,7 @@ static void cleanup_workqueue_thread(str
* checks list_empty(), and a "normal" queue_work() can't use
* a dead CPU.
*/
- while (flush_cpu_workqueue(cwq))
+ while (flush_cpu_workqueue(cwq, cpu))
;

kthread_stop(cwq->thread);

2007-06-02 20:44:38

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> On 06/01, Mark Hounschell wrote:
>> Oleg Nesterov wrote:
>>> On 06/01, Mark Hounschell wrote:
>>>> Ok the prctl never returned. I just replaced the ioctl with it and added
>>>> a printf before and after. I only get the one before. The thread is hung
>>>> at this point just as if I'd done the ioctl?
>>> Thanks. So we can rule out floppy.c. flush_scheduled_work/flush_workqueue
>>> is broken by this RT application. Imho, this is not the kernel problem.
>>>
>>> Now I am very sure that the initial suspect was correct: cpu starvation.
>>> I can cook a debug patch to be 100% sure tomorrow, which kernel version is
>>> most convenient to you?
>>>
>> 2.6.22-rc3 is fine thanks.
>
> Please try this patch, it should dump some debug info when flush_workqueue()
> hangs (after 30 seconds). You can use it with or without the previous patch
> I sent. Please wait for a couple of minutes to collect more info.
>
> Oleg.
>
> --- OLD/kernel/sched.c~TST 2007-04-05 12:20:35.000000000 +0400
> +++ OLD/kernel/sched.c 2007-06-02 15:41:53.000000000 +0400
> @@ -4177,6 +4177,20 @@ struct task_struct *idle_task(int cpu)
> return cpu_rq(cpu)->idle;
> }
>
> +struct task_struct *get_cpu_curr(int cpu)
> +{
> + unsigned long flags;
> + struct task_struct *curr;
> + struct rq *rq = cpu_rq(cpu);
> +
> + spin_lock_irqsave(&rq->lock, flags);
> + curr = rq->curr;
> + get_task_struct(curr);
> + spin_unlock_irqrestore(&rq->lock, flags);
> +
> + return curr;
> +}
> +
> /**
> * find_process_by_pid - find a process with a matching PID value.
> * @pid: the pid in question.
> --- OLD/kernel/workqueue.c~TST 2007-06-02 13:34:57.000000000 +0400
> +++ OLD/kernel/workqueue.c 2007-06-02 16:18:02.000000000 +0400
> @@ -49,6 +49,7 @@ struct cpu_workqueue_struct {
> struct task_struct *thread;
>
> int run_depth; /* Detect run_workqueue() recursion depth */
> + int jobs;
> } ____cacheline_aligned;
>
> /*
> @@ -253,6 +254,7 @@ static void run_workqueue(struct cpu_wor
>
> cwq->current_work = work;
> list_del_init(cwq->worklist.next);
> + cwq->jobs++;
> spin_unlock_irq(&cwq->lock);
>
> BUG_ON(get_wq_data(work) != cwq);
> @@ -328,7 +330,48 @@ static void insert_wq_barrier(struct cpu
> insert_work(cwq, &barr->work, tail);
> }
>
> -static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
> +extern struct task_struct *get_cpu_curr(int cpu);
> +
> +static void flush_wait(struct cpu_workqueue_struct *cwq, int cpu, struct completion *done)
> +{
> + struct task_struct *curr;
> + struct work_struct *work;
> + int old_pid, jobs;
> +
> + if (is_single_threaded(cwq->wq))
> + cpu = raw_smp_processor_id();
> +
> +again:
> + work = cwq->current_work;
> + jobs = cwq->jobs;
> +
> + curr = get_cpu_curr(cpu);
> + old_pid = curr->pid;
> + put_task_struct(curr);
> +
> + if (wait_for_completion_timeout(done, HZ * 30))
> + return;
> +
> + printk(KERN_ERR "ERR!! %s flush hang: %p %p %d %d\n", cwq->thread->comm,
> + work, cwq->current_work, jobs, cwq->jobs);
> +
> + curr = get_cpu_curr(cpu);
> + printk(KERN_ERR "CURR: %d %d %s %ld %ld\n", old_pid, curr->pid,
> + curr->comm, curr->nivcsw, curr->nvcsw);
> + put_task_struct(curr);
> +
> + spin_lock_irq(&cwq->lock);
> + list_for_each_entry(work, &cwq->worklist, entry)
> + print_symbol(" %s\n", (unsigned long) work->func);
> + printk(" ----\n");
> + if (cwq->current_work)
> + print_symbol(" %s\n", (unsigned long) cwq->current_work->func);
> + spin_unlock_irq(&cwq->lock);
> +
> + goto again;
> +}
> +
> +static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq, int cpu)
> {
> int active;
>
> @@ -351,7 +394,7 @@ static int flush_cpu_workqueue(struct cp
> spin_unlock_irq(&cwq->lock);
>
> if (active)
> - wait_for_completion(&barr.done);
> + flush_wait(cwq, cpu, &barr.done);
> }
>
> return active;
> @@ -377,7 +420,7 @@ void fastcall flush_workqueue(struct wor
>
> might_sleep();
> for_each_cpu_mask(cpu, *cpu_map)
> - flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
> + flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu), cpu);
> }
> EXPORT_SYMBOL_GPL(flush_workqueue);
>
> @@ -748,7 +791,7 @@ static void cleanup_workqueue_thread(str
> * checks list_empty(), and a "normal" queue_work() can't use
> * a dead CPU.
> */
> - while (flush_cpu_workqueue(cwq))
> + while (flush_cpu_workqueue(cwq, cpu))
> ;
>
> kthread_stop(cwq->thread);
>



Jun 2 16:36:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:36:11 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:36:11 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:36:11 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:36:11 harley kernel: ----
Jun 2 16:36:11 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:36:41 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:36:41 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:36:41 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:36:41 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:36:41 harley kernel: ----
Jun 2 16:36:41 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:37:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:37:11 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:37:11 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:37:11 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:37:11 harley kernel: ----
Jun 2 16:37:11 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:37:41 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:37:41 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:37:41 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:37:41 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:37:41 harley kernel: ----
Jun 2 16:37:41 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:37:51 harley kernel: RTOM: In int handler for 12 usec.
Jun 2 16:38:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:38:11 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:38:11 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:38:11 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:38:11 harley kernel: ----
Jun 2 16:38:11 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:38:41 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:38:41 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:38:41 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:38:41 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:38:41 harley kernel: ----
Jun 2 16:38:41 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:39:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:39:11 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:39:11 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:39:11 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:39:11 harley kernel: ----
Jun 2 16:39:11 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:39:41 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:39:41 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:39:41 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:39:41 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:39:41 harley kernel: ----
Jun 2 16:39:41 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:40:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:40:11 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:40:11 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:40:11 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:40:11 harley kernel: ----
Jun 2 16:40:11 harley kernel: cache_reap+0x0/0xf4
Jun 2 16:40:41 harley kernel: ERR!! events/1 flush hang: c201dbc0
c201dbc0 10012 10012
Jun 2 16:40:41 harley kernel: CURR: 7974 7974 vrsx 93 26
Jun 2 16:40:41 harley kernel: wq_barrier_func+0x0/0x8
Jun 2 16:40:41 harley kernel: vmstat_update+0x0/0x24
Jun 2 16:40:41 harley kernel: ----
Jun 2 16:40:41 harley kernel: cache_reap+0x0/0xf4


Regards
Mark

2007-06-03 08:14:33

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On 06/02, Mark Hounschell wrote:
>
> Jun 2 16:36:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
> c201dbc0 10012 10012
> Jun 2 16:36:11 harley kernel: CURR: 7974 7974 vrsx 93 26
> Jun 2 16:36:11 harley kernel: wq_barrier_func+0x0/0x8
> Jun 2 16:36:11 harley kernel: vmstat_update+0x0/0x24
> Jun 2 16:36:11 harley kernel: ----
> Jun 2 16:36:11 harley kernel: cache_reap+0x0/0xf4

As expected.

Note that ->nivcsw/->nvcsw doesn't change. There is no "spare time"
on CPU 1, "vrsx" monopolizes CPU. events/1->cache_reap() was preempted
by vrsx, it had no chance to run since then. Note that jobs == 7974
doesn't change too. I forgot to print cwq->thread->state, but it should
be TASK_RUNNING. It would not be possible to kill vrsx if cache_reap()
stalled.

I don't think this is a kernel problem, vrsx breaks flush_workqueue().
Ingo can answer authoritatively, but I think SCHED_RR/SCHED_FIFO were
not designed to be 100% cpu-bound.

That said, I think it makes sense to get rid of flush_scheduled_work()
in floppy.c.

Thanks!

Oleg.

2007-06-04 14:01:15

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Oleg Nesterov wrote:
> On 06/02, Mark Hounschell wrote:
>> Jun 2 16:36:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
>> c201dbc0 10012 10012
>> Jun 2 16:36:11 harley kernel: CURR: 7974 7974 vrsx 93 26
>> Jun 2 16:36:11 harley kernel: wq_barrier_func+0x0/0x8
>> Jun 2 16:36:11 harley kernel: vmstat_update+0x0/0x24
>> Jun 2 16:36:11 harley kernel: ----
>> Jun 2 16:36:11 harley kernel: cache_reap+0x0/0xf4
>
> As expected.
>
> Note that ->nivcsw/->nvcsw doesn't change. There is no "spare time"
> on CPU 1, "vrsx" monopolizes CPU. events/1->cache_reap() was preempted
> by vrsx, it had no chance to run since then. Note that jobs == 7974
> doesn't change too. I forgot to print cwq->thread->state, but it should
> be TASK_RUNNING. It would not be possible to kill vrsx if cache_reap()
> stalled.
>
> I don't think this is a kernel problem, vrsx breaks flush_workqueue().
> Ingo can answer authoritatively, but I think SCHED_RR/SCHED_FIFO were
> not designed to be 100% cpu-bound.
>
> That said, I think it makes sense to get rid of flush_scheduled_work()
> in floppy.c.
>

Oleg, thanks for your time in diagnosing this.

As far as a 100% CPU bound task being a valid thing to do, it has been
done for many years on SMP machines. Any kernel limitation on this
surely must be considered a bug?

Thanks again
Regards
Mark

2007-06-06 13:12:26

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Mark Hounschell wrote:
> Oleg Nesterov wrote:
>> On 06/02, Mark Hounschell wrote:
>>> Jun 2 16:36:11 harley kernel: ERR!! events/1 flush hang: c201dbc0
>>> c201dbc0 10012 10012
>>> Jun 2 16:36:11 harley kernel: CURR: 7974 7974 vrsx 93 26
>>> Jun 2 16:36:11 harley kernel: wq_barrier_func+0x0/0x8
>>> Jun 2 16:36:11 harley kernel: vmstat_update+0x0/0x24
>>> Jun 2 16:36:11 harley kernel: ----
>>> Jun 2 16:36:11 harley kernel: cache_reap+0x0/0xf4
>> As expected.
>>
>> Note that ->nivcsw/->nvcsw doesn't change. There is no "spare time"
>> on CPU 1, "vrsx" monopolizes CPU. events/1->cache_reap() was preempted
>> by vrsx, it had no chance to run since then. Note that jobs == 7974
>> doesn't change too. I forgot to print cwq->thread->state, but it should
>> be TASK_RUNNING. It would not be possible to kill vrsx if cache_reap()
>> stalled.
>>
>> I don't think this is a kernel problem, vrsx breaks flush_workqueue().
>> Ingo can answer authoritatively, but I think SCHED_RR/SCHED_FIFO were
>> not designed to be 100% cpu-bound.
>>
>> That said, I think it makes sense to get rid of flush_scheduled_work()
>> in floppy.c.
>>
>
> Oleg, thanks for your time in diagnosing this.
>
> As far as a 100% CPU bound task being a valid thing to do, it has been
> done for many years on SMP machines. Any kernel limitation on this
> surely must be considered a bug?
>

Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)

Thanks and Regards
Mark

2007-06-06 17:29:20

by Andrew Morton

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On Wed, 06 Jun 2007 09:12:04 -0400 Mark Hounschell <[email protected]> wrote:

> >
> > As far as a 100% CPU bound task being a valid thing to do, it has been
> > done for many years on SMP machines. Any kernel limitation on this
> > surely must be considered a bug?
> >
>
> Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
> 100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)

It will kill the kernel, sorry.

The only way in which we can fix that is to allow kernel threads to preempt
rt-priority userspace threads. But if we were to do that (to benefit the
few) it would cause _all_ people's rt-prio processes to experience glitches
due to kernel activity, which we believe to be worse.

So we're between a rock and a hard place here.

If we really did want to solve this then I guess the kernel would need some
new code to detect a 100%-busy rt-prio process and to then start premitting
preemption of it for kernel thread activity. That detector would need to
be smart enough to detect a number of 100%-busy rt-prio processes which are
yielding to each other, and one rt-prio process which keeps forking others,
etc. It might get tricky.

2007-06-07 01:32:48

by Matt Mackall

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On Wed, Jun 06, 2007 at 10:28:28AM -0700, Andrew Morton wrote:
> On Wed, 06 Jun 2007 09:12:04 -0400 Mark Hounschell <[email protected]> wrote:
>
> > >
> > > As far as a 100% CPU bound task being a valid thing to do, it has been
> > > done for many years on SMP machines. Any kernel limitation on this
> > > surely must be considered a bug?
> > >
> >
> > Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
> > 100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)
>
> It will kill the kernel, sorry.
>
> The only way in which we can fix that is to allow kernel threads to preempt
> rt-priority userspace threads. But if we were to do that (to benefit the
> few) it would cause _all_ people's rt-prio processes to experience glitches
> due to kernel activity, which we believe to be worse.
>
> So we're between a rock and a hard place here.
>
> If we really did want to solve this then I guess the kernel would need some
> new code to detect a 100%-busy rt-prio process and to then start premitting
> preemption of it for kernel thread activity. That detector would need to
> be smart enough to detect a number of 100%-busy rt-prio processes which are
> yielding to each other, and one rt-prio process which keeps forking others,
> etc. It might get tricky.

The usual alternative is to manually chrt the relevant kernel threads
to RT priority and adjust the priority scheme of their processes appropriately.

--
Mathematics is the supreme nostalgia of our time.

2007-06-07 10:19:34

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Matt Mackall wrote:
> On Wed, Jun 06, 2007 at 10:28:28AM -0700, Andrew Morton wrote:
>> On Wed, 06 Jun 2007 09:12:04 -0400 Mark Hounschell <[email protected]> wrote:
>>
>>>> As far as a 100% CPU bound task being a valid thing to do, it has been
>>>> done for many years on SMP machines. Any kernel limitation on this
>>>> surely must be considered a bug?
>>>>
>>> Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
>>> 100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)
>> It will kill the kernel, sorry.
>>
>> The only way in which we can fix that is to allow kernel threads to preempt
>> rt-priority userspace threads. But if we were to do that (to benefit the
>> few) it would cause _all_ people's rt-prio processes to experience glitches
>> due to kernel activity, which we believe to be worse.
>>
>> So we're between a rock and a hard place here.
>>
>> If we really did want to solve this then I guess the kernel would need some
>> new code to detect a 100%-busy rt-prio process and to then start premitting
>> preemption of it for kernel thread activity. That detector would need to
>> be smart enough to detect a number of 100%-busy rt-prio processes which are
>> yielding to each other, and one rt-prio process which keeps forking others,
>> etc. It might get tricky.
>
> The usual alternative is to manually chrt the relevant kernel threads
> to RT priority and adjust the priority scheme of their processes appropriately.
>

>From an earlier thread member:

>> Mark writes:
>> Again I don't understand why flush_scheduled_work() running on behalf
>> of a process affinitized to processor-1 requires cooperation from
>> events/2 (affinitized to processor-2)
>> when there is an events/1 already affinitized to processor 1?

>Oleg write:
>flush_workqueue() blocks until any scheduled work on any CPU has run to
>completion. If we have some work_struct pending on CPU 2, it can be
>completed only when events/2 executes it.

Could not flush_scheduled_work() just follow the affinity mask of the
task that caused the call to begin with. If calling task had a cpu-mask
of 3 then flush_scheduled_work() would do the events/0 and events/1
thing and if the calling task had an affinity mask of 1 then only
events/0 would be done?

In other words changing what Oleg says above just slightly:

flush_workqueue() blocks until any scheduled work on any CPU in the
calling tasks affinity mask has run to completion?

Thanks
Mark

2007-06-07 14:26:19

by Matt Mackall

[permalink] [raw]
Subject: Re: floppy.c soft lockup

On Thu, Jun 07, 2007 at 06:18:52AM -0400, Mark Hounschell wrote:
> Matt Mackall wrote:
> > On Wed, Jun 06, 2007 at 10:28:28AM -0700, Andrew Morton wrote:
> >> On Wed, 06 Jun 2007 09:12:04 -0400 Mark Hounschell <[email protected]> wrote:
> >>
> >>>> As far as a 100% CPU bound task being a valid thing to do, it has been
> >>>> done for many years on SMP machines. Any kernel limitation on this
> >>>> surely must be considered a bug?
> >>>>
> >>> Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
> >>> 100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)
> >> It will kill the kernel, sorry.
> >>
> >> The only way in which we can fix that is to allow kernel threads to preempt
> >> rt-priority userspace threads. But if we were to do that (to benefit the
> >> few) it would cause _all_ people's rt-prio processes to experience glitches
> >> due to kernel activity, which we believe to be worse.
> >>
> >> So we're between a rock and a hard place here.
> >>
> >> If we really did want to solve this then I guess the kernel would need some
> >> new code to detect a 100%-busy rt-prio process and to then start premitting
> >> preemption of it for kernel thread activity. That detector would need to
> >> be smart enough to detect a number of 100%-busy rt-prio processes which are
> >> yielding to each other, and one rt-prio process which keeps forking others,
> >> etc. It might get tricky.
> >
> > The usual alternative is to manually chrt the relevant kernel threads
> > to RT priority and adjust the priority scheme of their processes appropriately.
> >
>
> >From an earlier thread member:
>
> >> Mark writes:
> >> Again I don't understand why flush_scheduled_work() running on behalf
> >> of a process affinitized to processor-1 requires cooperation from
> >> events/2 (affinitized to processor-2)
> >> when there is an events/1 already affinitized to processor 1?
>
> >Oleg write:
> >flush_workqueue() blocks until any scheduled work on any CPU has run to
> >completion. If we have some work_struct pending on CPU 2, it can be
> >completed only when events/2 executes it.
>
> Could not flush_scheduled_work() just follow the affinity mask of the
> task that caused the call to begin with. If calling task had a cpu-mask
> of 3 then flush_scheduled_work() would do the events/0 and events/1
> thing and if the calling task had an affinity mask of 1 then only
> events/0 would be done?

The kernel's internal event API doesn't track any of this stuff and
it's not clear we'd want it to. It'd be a bit simpler perhaps to
simply allow SIGSTOPing events/0. This might even work today from
userspace.

In general, it's considered a mistake to mark CPU hogs as RT precisely
because they present a starvation risk to everything else in the
system, not just kernel threads. We could add kernel infrastructure to
make events survive this sort of thing, but that will very likely just
expose another kernel or userspace livelock.

--
Mathematics is the supreme nostalgia of our time.

2007-06-08 09:55:29

by Mark Hounschell

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Matt Mackall wrote:
> On Thu, Jun 07, 2007 at 06:18:52AM -0400, Mark Hounschell wrote:
>> Matt Mackall wrote:
>>> On Wed, Jun 06, 2007 at 10:28:28AM -0700, Andrew Morton wrote:
>>>> On Wed, 06 Jun 2007 09:12:04 -0400 Mark Hounschell <[email protected]> wrote:
>>>>
>>>>>> As far as a 100% CPU bound task being a valid thing to do, it has been
>>>>>> done for many years on SMP machines. Any kernel limitation on this
>>>>>> surely must be considered a bug?
>>>>>>
>>>>> Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
>>>>> 100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)
>>>> It will kill the kernel, sorry.
>>>>
>>>> The only way in which we can fix that is to allow kernel threads to preempt
>>>> rt-priority userspace threads. But if we were to do that (to benefit the
>>>> few) it would cause _all_ people's rt-prio processes to experience glitches
>>>> due to kernel activity, which we believe to be worse.
>>>>
>>>> So we're between a rock and a hard place here.
>>>>
>>>> If we really did want to solve this then I guess the kernel would need some
>>>> new code to detect a 100%-busy rt-prio process and to then start premitting
>>>> preemption of it for kernel thread activity. That detector would need to
>>>> be smart enough to detect a number of 100%-busy rt-prio processes which are
>>>> yielding to each other, and one rt-prio process which keeps forking others,
>>>> etc. It might get tricky.
>>> The usual alternative is to manually chrt the relevant kernel threads
>>> to RT priority and adjust the priority scheme of their processes appropriately.
>>>
>> >From an earlier thread member:
>>
>>>> Mark writes:
>>>> Again I don't understand why flush_scheduled_work() running on behalf
>>>> of a process affinitized to processor-1 requires cooperation from
>>>> events/2 (affinitized to processor-2)
>>>> when there is an events/1 already affinitized to processor 1?
>>> Oleg write:
>>> flush_workqueue() blocks until any scheduled work on any CPU has run to
>>> completion. If we have some work_struct pending on CPU 2, it can be
>>> completed only when events/2 executes it.
>> Could not flush_scheduled_work() just follow the affinity mask of the
>> task that caused the call to begin with. If calling task had a cpu-mask
>> of 3 then flush_scheduled_work() would do the events/0 and events/1
>> thing and if the calling task had an affinity mask of 1 then only
>> events/0 would be done?
>
> The kernel's internal event API doesn't track any of this stuff and
> it's not clear we'd want it to. It'd be a bit simpler perhaps to
> simply allow SIGSTOPing events/0. This might even work today from
> userspace.
>
> In general, it's considered a mistake to mark CPU hogs as RT precisely
> because they present a starvation risk to everything else in the
> system, not just kernel threads. We could add kernel infrastructure to
> make events survive this sort of thing, but that will very likely just
> expose another kernel or userspace livelock.
>

In general maybe, but we are not really talking in general terms here.
For everything other than kernel stuff, I (userland) would assume
responsibility. That is why I force all other userland tasks that I want
to run onto all the other processors before my RT-HOG starts.

>From a userland point of view, if I have an 8 processor box, I should be
able to have 7 RT-HOGS running on it while all other proceses are on the
8th. Yet in Linux that I can't have even one because it breaks the
kernel. I'm sorry, this does not sound right to me. To me and people in
my world, this is clearly a kernel deficiency.

Regards
Mark

2007-06-13 16:17:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: floppy.c soft lockup

Sorry for delay,

On 06/07, Mark Hounschell wrote:
>
> >From an earlier thread member:
>
> >> Mark writes:
> >> Again I don't understand why flush_scheduled_work() running on behalf
> >> of a process affinitized to processor-1 requires cooperation from
> >> events/2 (affinitized to processor-2)
> >> when there is an events/1 already affinitized to processor 1?
>
> >Oleg write:
> >flush_workqueue() blocks until any scheduled work on any CPU has run to
> >completion. If we have some work_struct pending on CPU 2, it can be
> >completed only when events/2 executes it.
>
> Could not flush_scheduled_work() just follow the affinity mask of the
> task that caused the call to begin with. If calling task had a cpu-mask
> of 3 then flush_scheduled_work() would do the events/0 and events/1
> thing and if the calling task had an affinity mask of 1 then only
> events/0 would be done?
>
> In other words changing what Oleg says above just slightly:
>
> flush_workqueue() blocks until any scheduled work on any CPU in the
> calling tasks affinity mask has run to completion?

No, we can't do this, this makes flush_workqueue() meaningless.

Even if we could, this can't help. Suppose that a kernel thread takes some
global lock (for example, in our case cache_reap() takes cache_chain_mutex)
and then it is preempted by RT task which doesn't relinquish CPU.

So this problem is "wider", flush_workqueue() was just a random victim.

Oleg.