2015-11-20 15:35:50

by Vladimir Murzin

[permalink] [raw]
Subject: [BISECTED] rcu_sched self-detected stall since 3.17

Hi,

I've been getting rcu_sched self-detected stall on one of our test
environment with the following (or similar) log:

Linux 4.3

> INFO: rcu_sched self-detected stall on CPU
> 2: (2099 ticks this GP) idle=a73/140000000000001/0 softirq=277/283 fqs=2092
> (t=2100 jiffies g=-197 c=-198 q=230)
> Task dump for CPU 2:
> paranoia R running 0 423 1 0x00000003
> [<c0016f20>] (unwind_backtrace) from [<c0012fd0>] (show_stack+0x10/0x14)
> [<c0012fd0>] (show_stack) from [<c006c52c>] (rcu_dump_cpu_stacks+0xa4/0xc4)
> [<c006c52c>] (rcu_dump_cpu_stacks) from [<c006f188>] (rcu_check_callbacks+0x2dc/0x81c)
> [<c006f188>] (rcu_check_callbacks) from [<c0072278>] (update_process_times+0x38/0x64)
> [<c0072278>] (update_process_times) from [<c007e7b8>] (tick_periodic+0xa8/0xb8)
> [<c007e7b8>] (tick_periodic) from [<c007e904>] (tick_handle_periodic+0x28/0x88)
> [<c007e904>] (tick_handle_periodic) from [<c0223074>] (arch_timer_handler_virt+0x28/0x30)
> [<c0223074>] (arch_timer_handler_virt) from [<c0068008>] (handle_percpu_devid_irq+0x6c/0x84)
> [<c0068008>] (handle_percpu_devid_irq) from [<c0064244>] (generic_handle_irq+0x24/0x34)
> [<c0064244>] (generic_handle_irq) from [<c0064518>] (__handle_domain_irq+0x98/0xac)
> [<c0064518>] (__handle_domain_irq) from [<c0009444>] (gic_handle_irq+0x54/0x90)
> [<c0009444>] (gic_handle_irq) from [<c0013b14>] (__irq_svc+0x54/0x70)
> Exception stack(0xdb007c60 to 0xdb007ca8)
> 7c60: 00000000 00000000 00000000 00000001 dc3e9a20 00000000 00000000 00000001
> 7c80: dc3e9a20 0000001b 00080000 0000000b ffffffff db007cb0 c00ac8f0 c00aa96c
> 7ca0: 600d0113 ffffffff
> [<c0013b14>] (__irq_svc) from [<c00aa96c>] (free_pages_prepare+0x18c/0x218)
> [<c00aa96c>] (free_pages_prepare) from [<c00ac8f0>] (free_hot_cold_page+0x34/0x168)
> [<c00ac8f0>] (free_hot_cold_page) from [<c00c996c>] (handle_mm_fault+0x7b8/0xe60)
> [<c00c996c>] (handle_mm_fault) from [<c001c194>] (do_page_fault+0x12c/0x378)
> [<c001c194>] (do_page_fault) from [<c00092dc>] (do_DataAbort+0x38/0xb4)
> [<c00092dc>] (do_DataAbort) from [<c0013aa0>] (__dabt_svc+0x40/0x60)
> Exception stack(0xdb007e68 to 0xdb007eb0)
> 7e60: 0001b2c8 00000d30 00000000 00000055 00000051 dbb04600
> 7e80: 00000000 db007ec8 db8ea700 db8f3080 dba43240 0001b2c8 00000000 db007eb8
> 7ea0: c012c178 c0174ce8 200d0113 ffffffff
> [<c0013aa0>] (__dabt_svc) from [<c0174ce8>] (__clear_user_std+0x34/0x68)
> [<c0174ce8>] (__clear_user_std) from [<c012c178>] (padzero+0x58/0x74)
> [<c012c178>] (padzero) from [<c012cb4c>] (load_elf_binary+0x730/0x1250)
> [<c012cb4c>] (load_elf_binary) from [<c00ecd6c>] (search_binary_handler+0xa0/0x23c)
> [<c00ecd6c>] (search_binary_handler) from [<c00ed730>] (do_execveat_common+0x444/0x5d8)
> [<c00ed730>] (do_execveat_common) from [<c00ed8e8>] (do_execve+0x24/0x2c)
> [<c00ed8e8>] (do_execve) from [<c000f580>] (ret_fast_syscall+0x0/0x3c)

(I put report for Linux 3.17 under [2])

This rcu_sched self-detected stall is usually reproduced 2 or 3 times
out of 10 runs, but sometimes even more runs go without issue.

I bisected [1] it to commit

commit 743162013d40ca612b4cb53d3a200dff2d9ab26e
Author: NeilBrown <[email protected]>
Date: Mon Jul 7 15:16:04 2014 +1000

sched: Remove proliferation of wait_on_bit() action functions

The only change I noticed is from (mm/filemap.c)

io_schedule();
fatal_signal_pending(current)

to (kernel/sched/wait.c)

signal_pending_state(current->state, current)
io_schedule();

and if I apply following diff I don't see stalls anymore.

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index a104879..2d68cdb 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait);

__sched int bit_wait_io(void *word)
{
+ io_schedule();
+
if (signal_pending_state(current->state, current))
return 1;
- io_schedule();
return 0;
}
EXPORT_SYMBOL(bit_wait_io);

Any ideas why it might happen and why diff above helps?

[1] Bisect log:

I checked 4.[0-3] and 3.1[7-9] manually
I have https://lkml.org/lkml/2015/8/26/649 for better report

> git bisect start '--no-checkout'
> # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
> git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
> # bad: [53ee983378ff23e8f3ff95ecf99dea7c6c221900] Merge tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect bad 53ee983378ff23e8f3ff95ecf99dea7c6c221900
> # good: [2042088cd67d0064d18c52c13c69af2499907bb1] staging: comedi: ni_labpc: tidy up labpc_ai_scan_mode()
> git bisect good 2042088cd67d0064d18c52c13c69af2499907bb1
> # bad: [98959948a7ba33cf8c708626e0d2a1456397e1c6] Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 98959948a7ba33cf8c708626e0d2a1456397e1c6
> # good: [c9b88e9581828bb8bba06c5e7ee8ed1761172b6e] Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
> git bisect good c9b88e9581828bb8bba06c5e7ee8ed1761172b6e
> # good: [5bda4f638f36ef4c4e3b1397b02affc3db94356e] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 5bda4f638f36ef4c4e3b1397b02affc3db94356e
> # good: [2336ebc32676df5b794acfe0c980583ec6c05f34] Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/core
> git bisect good 2336ebc32676df5b794acfe0c980583ec6c05f34
> # good: [ef35ad26f8ff44d2c93e29952cdb336bda729d9d] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good ef35ad26f8ff44d2c93e29952cdb336bda729d9d
> # good: [1c5d3eb3759013bc7ee4197aa0a9f245bdb6eb90] sched/numa: Simplify task_numa_compare()
> git bisect good 1c5d3eb3759013bc7ee4197aa0a9f245bdb6eb90
> # good: [6e76ea8a8209386c3cc7ee5594e6ea5d25525cf2] sched: Remove extra static_key*() function indirection
> git bisect good 6e76ea8a8209386c3cc7ee5594e6ea5d25525cf2
> # bad: [c1221321b7c25b53204447cff9949a6d5a7ddddc] sched: Allow wait_on_bit_action() functions to support a timeout
> git bisect bad c1221321b7c25b53204447cff9949a6d5a7ddddc
> # good: [e720fff6341fe4b95e5a93c939bd3c77fa55ced4] sched/numa: Revert "Use effective_load() to balance NUMA loads"
> git bisect good e720fff6341fe4b95e5a93c939bd3c77fa55ced4
> # bad: [743162013d40ca612b4cb53d3a200dff2d9ab26e] sched: Remove proliferation of wait_on_bit() action functions
> git bisect bad 743162013d40ca612b4cb53d3a200dff2d9ab26e
> # good: [d26fad5b38e1c4667d4f2604936e59c837caa54d] Merge tag 'v3.16-rc5' into sched/core, to refresh the branch before applying bigger tree-wide changes
> git bisect good d26fad5b38e1c4667d4f2604936e59c837caa54d

[2] rcu_sched self-detected stall under Linux 3.17

> INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=-199 c=-200 q=172)
> Task dump for CPU 2:
> paranoia R running 0 414 1 0x00000003
> [<c0015394>] (unwind_backtrace) from [<c00117dc>] (show_stack+0x10/0x14)
> [<c00117dc>] (show_stack) from [<c00602d0>] (rcu_dump_cpu_stacks+0xa4/0xc4)
> [<c00602d0>] (rcu_dump_cpu_stacks) from [<c0062aec>] (rcu_check_callbacks+0x280/0x6f8)
> [<c0062aec>] (rcu_check_callbacks) from [<c0065b9c>] (update_process_times+0x40/0x60)
> [<c0065b9c>] (update_process_times) from [<c007169c>] (tick_periodic+0xa8/0xb8)
> [<c007169c>] (tick_periodic) from [<c00717bc>] (tick_handle_periodic+0x28/0x88)
> [<c00717bc>] (tick_handle_periodic) from [<c01fc414>] (arch_timer_handler_virt+0x28/0x30)
> [<c01fc414>] (arch_timer_handler_virt) from [<c005d100>] (handle_percpu_devid_irq+0x6c/0x84)
> [<c005d100>] (handle_percpu_devid_irq) from [<c0059658>] (generic_handle_irq+0x2c/0x3c)
> [<c0059658>] (generic_handle_irq) from [<c000ee3c>] (handle_IRQ+0x7c/0x8c)
> [<c000ee3c>] (handle_IRQ) from [<c00085f8>] (gic_handle_irq+0x40/0x5c)
> [<c00085f8>] (gic_handle_irq) from [<c0012300>] (__irq_svc+0x40/0x54)
> Exception stack(0xdb063c60 to 0xdb063ca8)
> 3c60: 00000000 0001b000 dbbfc000 0000000b dbbfc000 0000000b dbbfc000 0001b000
> 3c80: 0001b000 dbb2d6e0 0000001b db03c380 0000006c db063ca8 c00b935c c00b935c
> 3ca0: a00d0113 ffffffff
> [<c0012300>] (__irq_svc) from [<c00b935c>] (do_cow_fault.isra.97+0x1c/0x198)
> [<c00b935c>] (do_cow_fault.isra.97) from [<c00b9804>] (handle_mm_fault+0x14c/0x900)
> [<c00b9804>] (handle_mm_fault) from [<c001ae38>] (do_page_fault+0x120/0x3d8)
> [<c001ae38>] (do_page_fault) from [<c00084c0>] (do_DataAbort+0x38/0x98)
> [<c00084c0>] (do_DataAbort) from [<c0012298>] (__dabt_svc+0x38/0x60)
> Exception stack(0xdb063e70 to 0xdb063eb8)
> 3e60: 0001b2c8 00000d30 00000000 00000000
> 3e80: db902a80 dbadc100 00000000 db063ec8 db062008 dbadc200 0001b2c8 0001b6cc
> 3ea0: 00000000 db063eb8 c011689c c015a884 200d0013 ffffffff
> [<c0012298>] (__dabt_svc) from [<c015a884>] (__clear_user_std+0x34/0x64)
> [<c015a884>] (__clear_user_std) from [<c011689c>] (padzero+0x44/0x58)
> [<c011689c>] (padzero) from [<c01182e8>] (load_elf_binary+0x778/0x138c)
> [<c01182e8>] (load_elf_binary) from [<c00da960>] (search_binary_handler+0x98/0x1d8)
> [<c00da960>] (search_binary_handler) from [<c00dbc2c>] (do_execve+0x35c/0x4bc)
> [<c00dbc2c>] (do_execve) from [<c000e520>] (ret_fast_syscall+0x0/0x30)

Thanks
Vladimir


2015-12-01 11:50:24

by Vladimir Murzin

[permalink] [raw]
Subject: Re: [BISECTED] rcu_sched self-detected stall since 3.17

On 20/11/15 15:35, Vladimir Murzin wrote:
> Hi,
>
> I've been getting rcu_sched self-detected stall on one of our test
> environment with the following (or similar) log:
>
> Linux 4.3

Anybody looking into this? Oleg, Peter, any ideas what's happening here?

Thanks
Vladimir

>
>> INFO: rcu_sched self-detected stall on CPU
>> 2: (2099 ticks this GP) idle=a73/140000000000001/0 softirq=277/283 fqs=2092
>> (t=2100 jiffies g=-197 c=-198 q=230)
>> Task dump for CPU 2:
>> paranoia R running 0 423 1 0x00000003
>> [<c0016f20>] (unwind_backtrace) from [<c0012fd0>] (show_stack+0x10/0x14)
>> [<c0012fd0>] (show_stack) from [<c006c52c>] (rcu_dump_cpu_stacks+0xa4/0xc4)
>> [<c006c52c>] (rcu_dump_cpu_stacks) from [<c006f188>] (rcu_check_callbacks+0x2dc/0x81c)
>> [<c006f188>] (rcu_check_callbacks) from [<c0072278>] (update_process_times+0x38/0x64)
>> [<c0072278>] (update_process_times) from [<c007e7b8>] (tick_periodic+0xa8/0xb8)
>> [<c007e7b8>] (tick_periodic) from [<c007e904>] (tick_handle_periodic+0x28/0x88)
>> [<c007e904>] (tick_handle_periodic) from [<c0223074>] (arch_timer_handler_virt+0x28/0x30)
>> [<c0223074>] (arch_timer_handler_virt) from [<c0068008>] (handle_percpu_devid_irq+0x6c/0x84)
>> [<c0068008>] (handle_percpu_devid_irq) from [<c0064244>] (generic_handle_irq+0x24/0x34)
>> [<c0064244>] (generic_handle_irq) from [<c0064518>] (__handle_domain_irq+0x98/0xac)
>> [<c0064518>] (__handle_domain_irq) from [<c0009444>] (gic_handle_irq+0x54/0x90)
>> [<c0009444>] (gic_handle_irq) from [<c0013b14>] (__irq_svc+0x54/0x70)
>> Exception stack(0xdb007c60 to 0xdb007ca8)
>> 7c60: 00000000 00000000 00000000 00000001 dc3e9a20 00000000 00000000 00000001
>> 7c80: dc3e9a20 0000001b 00080000 0000000b ffffffff db007cb0 c00ac8f0 c00aa96c
>> 7ca0: 600d0113 ffffffff
>> [<c0013b14>] (__irq_svc) from [<c00aa96c>] (free_pages_prepare+0x18c/0x218)
>> [<c00aa96c>] (free_pages_prepare) from [<c00ac8f0>] (free_hot_cold_page+0x34/0x168)
>> [<c00ac8f0>] (free_hot_cold_page) from [<c00c996c>] (handle_mm_fault+0x7b8/0xe60)
>> [<c00c996c>] (handle_mm_fault) from [<c001c194>] (do_page_fault+0x12c/0x378)
>> [<c001c194>] (do_page_fault) from [<c00092dc>] (do_DataAbort+0x38/0xb4)
>> [<c00092dc>] (do_DataAbort) from [<c0013aa0>] (__dabt_svc+0x40/0x60)
>> Exception stack(0xdb007e68 to 0xdb007eb0)
>> 7e60: 0001b2c8 00000d30 00000000 00000055 00000051 dbb04600
>> 7e80: 00000000 db007ec8 db8ea700 db8f3080 dba43240 0001b2c8 00000000 db007eb8
>> 7ea0: c012c178 c0174ce8 200d0113 ffffffff
>> [<c0013aa0>] (__dabt_svc) from [<c0174ce8>] (__clear_user_std+0x34/0x68)
>> [<c0174ce8>] (__clear_user_std) from [<c012c178>] (padzero+0x58/0x74)
>> [<c012c178>] (padzero) from [<c012cb4c>] (load_elf_binary+0x730/0x1250)
>> [<c012cb4c>] (load_elf_binary) from [<c00ecd6c>] (search_binary_handler+0xa0/0x23c)
>> [<c00ecd6c>] (search_binary_handler) from [<c00ed730>] (do_execveat_common+0x444/0x5d8)
>> [<c00ed730>] (do_execveat_common) from [<c00ed8e8>] (do_execve+0x24/0x2c)
>> [<c00ed8e8>] (do_execve) from [<c000f580>] (ret_fast_syscall+0x0/0x3c)
>
> (I put report for Linux 3.17 under [2])
>
> This rcu_sched self-detected stall is usually reproduced 2 or 3 times
> out of 10 runs, but sometimes even more runs go without issue.
>
> I bisected [1] it to commit
>
> commit 743162013d40ca612b4cb53d3a200dff2d9ab26e
> Author: NeilBrown <[email protected]>
> Date: Mon Jul 7 15:16:04 2014 +1000
>
> sched: Remove proliferation of wait_on_bit() action functions
>
> The only change I noticed is from (mm/filemap.c)
>
> io_schedule();
> fatal_signal_pending(current)
>
> to (kernel/sched/wait.c)
>
> signal_pending_state(current->state, current)
> io_schedule();
>
> and if I apply following diff I don't see stalls anymore.
>
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index a104879..2d68cdb 100644
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait);
>
> __sched int bit_wait_io(void *word)
> {
> + io_schedule();
> +
> if (signal_pending_state(current->state, current))
> return 1;
> - io_schedule();
> return 0;
> }
> EXPORT_SYMBOL(bit_wait_io);
>
> Any ideas why it might happen and why diff above helps?
>
> [1] Bisect log:
>
> I checked 4.[0-3] and 3.1[7-9] manually
> I have https://lkml.org/lkml/2015/8/26/649 for better report
>
>> git bisect start '--no-checkout'
>> # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
>> git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
>> # bad: [53ee983378ff23e8f3ff95ecf99dea7c6c221900] Merge tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
>> git bisect bad 53ee983378ff23e8f3ff95ecf99dea7c6c221900
>> # good: [2042088cd67d0064d18c52c13c69af2499907bb1] staging: comedi: ni_labpc: tidy up labpc_ai_scan_mode()
>> git bisect good 2042088cd67d0064d18c52c13c69af2499907bb1
>> # bad: [98959948a7ba33cf8c708626e0d2a1456397e1c6] Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad 98959948a7ba33cf8c708626e0d2a1456397e1c6
>> # good: [c9b88e9581828bb8bba06c5e7ee8ed1761172b6e] Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
>> git bisect good c9b88e9581828bb8bba06c5e7ee8ed1761172b6e
>> # good: [5bda4f638f36ef4c4e3b1397b02affc3db94356e] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect good 5bda4f638f36ef4c4e3b1397b02affc3db94356e
>> # good: [2336ebc32676df5b794acfe0c980583ec6c05f34] Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/core
>> git bisect good 2336ebc32676df5b794acfe0c980583ec6c05f34
>> # good: [ef35ad26f8ff44d2c93e29952cdb336bda729d9d] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect good ef35ad26f8ff44d2c93e29952cdb336bda729d9d
>> # good: [1c5d3eb3759013bc7ee4197aa0a9f245bdb6eb90] sched/numa: Simplify task_numa_compare()
>> git bisect good 1c5d3eb3759013bc7ee4197aa0a9f245bdb6eb90
>> # good: [6e76ea8a8209386c3cc7ee5594e6ea5d25525cf2] sched: Remove extra static_key*() function indirection
>> git bisect good 6e76ea8a8209386c3cc7ee5594e6ea5d25525cf2
>> # bad: [c1221321b7c25b53204447cff9949a6d5a7ddddc] sched: Allow wait_on_bit_action() functions to support a timeout
>> git bisect bad c1221321b7c25b53204447cff9949a6d5a7ddddc
>> # good: [e720fff6341fe4b95e5a93c939bd3c77fa55ced4] sched/numa: Revert "Use effective_load() to balance NUMA loads"
>> git bisect good e720fff6341fe4b95e5a93c939bd3c77fa55ced4
>> # bad: [743162013d40ca612b4cb53d3a200dff2d9ab26e] sched: Remove proliferation of wait_on_bit() action functions
>> git bisect bad 743162013d40ca612b4cb53d3a200dff2d9ab26e
>> # good: [d26fad5b38e1c4667d4f2604936e59c837caa54d] Merge tag 'v3.16-rc5' into sched/core, to refresh the branch before applying bigger tree-wide changes
>> git bisect good d26fad5b38e1c4667d4f2604936e59c837caa54d
>
> [2] rcu_sched self-detected stall under Linux 3.17
>
>> INFO: rcu_sched self-detected stall on CPU { 2} (t=2100 jiffies g=-199 c=-200 q=172)
>> Task dump for CPU 2:
>> paranoia R running 0 414 1 0x00000003
>> [<c0015394>] (unwind_backtrace) from [<c00117dc>] (show_stack+0x10/0x14)
>> [<c00117dc>] (show_stack) from [<c00602d0>] (rcu_dump_cpu_stacks+0xa4/0xc4)
>> [<c00602d0>] (rcu_dump_cpu_stacks) from [<c0062aec>] (rcu_check_callbacks+0x280/0x6f8)
>> [<c0062aec>] (rcu_check_callbacks) from [<c0065b9c>] (update_process_times+0x40/0x60)
>> [<c0065b9c>] (update_process_times) from [<c007169c>] (tick_periodic+0xa8/0xb8)
>> [<c007169c>] (tick_periodic) from [<c00717bc>] (tick_handle_periodic+0x28/0x88)
>> [<c00717bc>] (tick_handle_periodic) from [<c01fc414>] (arch_timer_handler_virt+0x28/0x30)
>> [<c01fc414>] (arch_timer_handler_virt) from [<c005d100>] (handle_percpu_devid_irq+0x6c/0x84)
>> [<c005d100>] (handle_percpu_devid_irq) from [<c0059658>] (generic_handle_irq+0x2c/0x3c)
>> [<c0059658>] (generic_handle_irq) from [<c000ee3c>] (handle_IRQ+0x7c/0x8c)
>> [<c000ee3c>] (handle_IRQ) from [<c00085f8>] (gic_handle_irq+0x40/0x5c)
>> [<c00085f8>] (gic_handle_irq) from [<c0012300>] (__irq_svc+0x40/0x54)
>> Exception stack(0xdb063c60 to 0xdb063ca8)
>> 3c60: 00000000 0001b000 dbbfc000 0000000b dbbfc000 0000000b dbbfc000 0001b000
>> 3c80: 0001b000 dbb2d6e0 0000001b db03c380 0000006c db063ca8 c00b935c c00b935c
>> 3ca0: a00d0113 ffffffff
>> [<c0012300>] (__irq_svc) from [<c00b935c>] (do_cow_fault.isra.97+0x1c/0x198)
>> [<c00b935c>] (do_cow_fault.isra.97) from [<c00b9804>] (handle_mm_fault+0x14c/0x900)
>> [<c00b9804>] (handle_mm_fault) from [<c001ae38>] (do_page_fault+0x120/0x3d8)
>> [<c001ae38>] (do_page_fault) from [<c00084c0>] (do_DataAbort+0x38/0x98)
>> [<c00084c0>] (do_DataAbort) from [<c0012298>] (__dabt_svc+0x38/0x60)
>> Exception stack(0xdb063e70 to 0xdb063eb8)
>> 3e60: 0001b2c8 00000d30 00000000 00000000
>> 3e80: db902a80 dbadc100 00000000 db063ec8 db062008 dbadc200 0001b2c8 0001b6cc
>> 3ea0: 00000000 db063eb8 c011689c c015a884 200d0013 ffffffff
>> [<c0012298>] (__dabt_svc) from [<c015a884>] (__clear_user_std+0x34/0x64)
>> [<c015a884>] (__clear_user_std) from [<c011689c>] (padzero+0x44/0x58)
>> [<c011689c>] (padzero) from [<c01182e8>] (load_elf_binary+0x778/0x138c)
>> [<c01182e8>] (load_elf_binary) from [<c00da960>] (search_binary_handler+0x98/0x1d8)
>> [<c00da960>] (search_binary_handler) from [<c00dbc2c>] (do_execve+0x35c/0x4bc)
>> [<c00dbc2c>] (do_execve) from [<c000e520>] (ret_fast_syscall+0x0/0x30)
>
> Thanks
> Vladimir
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
>
>

2015-12-01 13:04:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [BISECTED] rcu_sched self-detected stall since 3.17

Sorry for the delay and thanks for the reminder!

On Fri, Nov 20, 2015 at 03:35:38PM +0000, Vladimir Murzin wrote:
> commit 743162013d40ca612b4cb53d3a200dff2d9ab26e
> Author: NeilBrown <[email protected]>
> Date: Mon Jul 7 15:16:04 2014 +1000
>
> sched: Remove proliferation of wait_on_bit() action functions
>
> The only change I noticed is from (mm/filemap.c)
>
> io_schedule();
> fatal_signal_pending(current)
>
> to (kernel/sched/wait.c)
>
> signal_pending_state(current->state, current)
> io_schedule();
>
> and if I apply following diff I don't see stalls anymore.
>
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index a104879..2d68cdb 100644
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait);
>
> __sched int bit_wait_io(void *word)
> {
> + io_schedule();
> +
> if (signal_pending_state(current->state, current))
> return 1;
> - io_schedule();
> return 0;
> }
> EXPORT_SYMBOL(bit_wait_io);
>
> Any ideas why it might happen and why diff above helps?

Yes, the code as presented is simply wrong. And in fact most of the code
it replaced was of the right form (with a few exceptions which would
indeed have been subject to the same problem you've observed.

Note how the late:

- cifs_sb_tcon_pending_wait
- fscache_wait_bit_interruptible
- sleep_on_page_killable
- wait_inquiry
- key_wait_bit_intr

All check the signal state _after_ calling schedule().

As opposed to:

- gfs2_journalid_wait

which follows the broken pattern.

Further notice that most expect a return of -EINTR, which also seems
correct given that this is a signal, those that do not return -EINTR
only check for a !0 return value so would work equally well with -EINTR.

The reason this is broken is that schedule() will no-op when there is a
pending signal, while raising a signal will also issue a wakeup.

Thus the right thing to do is check for the signal state after, that way
you handle both cases:

- calling schedule() with a signal pending
- receiving a signal while sleeping

As such, I would propose the below patch. Oleg, do you concur?

---
Subject: sched,wait: Fix signal handling in bit wait helpers

Vladimir reported getting RCU stall warnings and bisected it back to
commit 743162013d40. That commit inadvertently reversed the calls to
schedule() and signal_pending(), thereby not handling the case where the
signal receives while we sleep.

Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
Reported-by: Vladimir Murzin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/wait.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 052e02672d12..f10bd873e684 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -583,18 +583,18 @@ EXPORT_SYMBOL(wake_up_atomic_t);

__sched int bit_wait(struct wait_bit_key *word)
{
- if (signal_pending_state(current->state, current))
- return 1;
schedule();
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait);

__sched int bit_wait_io(struct wait_bit_key *word)
{
- if (signal_pending_state(current->state, current))
- return 1;
io_schedule();
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait_io);
@@ -602,11 +602,11 @@ EXPORT_SYMBOL(bit_wait_io);
__sched int bit_wait_timeout(struct wait_bit_key *word)
{
unsigned long now = READ_ONCE(jiffies);
- if (signal_pending_state(current->state, current))
- return 1;
if (time_after_eq(now, word->timeout))
return -EAGAIN;
schedule_timeout(word->timeout - now);
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL_GPL(bit_wait_timeout);
@@ -614,11 +614,11 @@ EXPORT_SYMBOL_GPL(bit_wait_timeout);
__sched int bit_wait_io_timeout(struct wait_bit_key *word)
{
unsigned long now = READ_ONCE(jiffies);
- if (signal_pending_state(current->state, current))
- return 1;
if (time_after_eq(now, word->timeout))
return -EAGAIN;
io_schedule_timeout(word->timeout - now);
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL_GPL(bit_wait_io_timeout);

2015-12-02 06:53:06

by Ross Green

[permalink] [raw]
Subject: Re: [BISECTED] rcu_sched self-detected stall since 3.17

I too have traced a similar RCU stall all the way back to 3.17.

Seems to affect RCU preempt code as well.

I had not been able to find a sure fail way to trigger this and it
could take several days running before problem arises.

Ross

2015-12-02 09:04:24

by Vladimir Murzin

[permalink] [raw]
Subject: Re: [BISECTED] rcu_sched self-detected stall since 3.17

On 01/12/15 13:04, Peter Zijlstra wrote:
> Sorry for the delay and thanks for the reminder!
>
> On Fri, Nov 20, 2015 at 03:35:38PM +0000, Vladimir Murzin wrote:
>> commit 743162013d40ca612b4cb53d3a200dff2d9ab26e
>> Author: NeilBrown <[email protected]>
>> Date: Mon Jul 7 15:16:04 2014 +1000
>>
>> sched: Remove proliferation of wait_on_bit() action functions
>>
>> The only change I noticed is from (mm/filemap.c)
>>
>> io_schedule();
>> fatal_signal_pending(current)
>>
>> to (kernel/sched/wait.c)
>>
>> signal_pending_state(current->state, current)
>> io_schedule();
>>
>> and if I apply following diff I don't see stalls anymore.
>>
>> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
>> index a104879..2d68cdb 100644
>> --- a/kernel/sched/wait.c
>> +++ b/kernel/sched/wait.c
>> @@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait);
>>
>> __sched int bit_wait_io(void *word)
>> {
>> + io_schedule();
>> +
>> if (signal_pending_state(current->state, current))
>> return 1;
>> - io_schedule();
>> return 0;
>> }
>> EXPORT_SYMBOL(bit_wait_io);
>>
>> Any ideas why it might happen and why diff above helps?
>
> Yes, the code as presented is simply wrong. And in fact most of the code
> it replaced was of the right form (with a few exceptions which would
> indeed have been subject to the same problem you've observed.
>
> Note how the late:
>
> - cifs_sb_tcon_pending_wait
> - fscache_wait_bit_interruptible
> - sleep_on_page_killable
> - wait_inquiry
> - key_wait_bit_intr
>
> All check the signal state _after_ calling schedule().
>
> As opposed to:
>
> - gfs2_journalid_wait
>
> which follows the broken pattern.
>
> Further notice that most expect a return of -EINTR, which also seems
> correct given that this is a signal, those that do not return -EINTR
> only check for a !0 return value so would work equally well with -EINTR.
>
> The reason this is broken is that schedule() will no-op when there is a
> pending signal, while raising a signal will also issue a wakeup.
>

Glad to hear confirmation on a problem. Thanks for detailed answer!

> Thus the right thing to do is check for the signal state after, that way
> you handle both cases:
>
> - calling schedule() with a signal pending
> - receiving a signal while sleeping
>
> As such, I would propose the below patch. Oleg, do you concur?
>
> ---
> Subject: sched,wait: Fix signal handling in bit wait helpers
>
> Vladimir reported getting RCU stall warnings and bisected it back to
> commit 743162013d40. That commit inadvertently reversed the calls to
> schedule() and signal_pending(), thereby not handling the case where the
> signal receives while we sleep.
>
> Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
> Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
> Reported-by: Vladimir Murzin <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> kernel/sched/wait.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index 052e02672d12..f10bd873e684 100644
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -583,18 +583,18 @@ EXPORT_SYMBOL(wake_up_atomic_t);
>
> __sched int bit_wait(struct wait_bit_key *word)
> {
> - if (signal_pending_state(current->state, current))
> - return 1;
> schedule();
> + if (signal_pending(current))
> + return -EINTR;
> return 0;
> }
> EXPORT_SYMBOL(bit_wait);
>
> __sched int bit_wait_io(struct wait_bit_key *word)
> {
> - if (signal_pending_state(current->state, current))
> - return 1;
> io_schedule();
> + if (signal_pending(current))
> + return -EINTR;
> return 0;
> }
> EXPORT_SYMBOL(bit_wait_io);
> @@ -602,11 +602,11 @@ EXPORT_SYMBOL(bit_wait_io);
> __sched int bit_wait_timeout(struct wait_bit_key *word)
> {
> unsigned long now = READ_ONCE(jiffies);
> - if (signal_pending_state(current->state, current))
> - return 1;
> if (time_after_eq(now, word->timeout))
> return -EAGAIN;
> schedule_timeout(word->timeout - now);
> + if (signal_pending(current))
> + return -EINTR;
> return 0;
> }
> EXPORT_SYMBOL_GPL(bit_wait_timeout);
> @@ -614,11 +614,11 @@ EXPORT_SYMBOL_GPL(bit_wait_timeout);
> __sched int bit_wait_io_timeout(struct wait_bit_key *word)
> {
> unsigned long now = READ_ONCE(jiffies);
> - if (signal_pending_state(current->state, current))
> - return 1;
> if (time_after_eq(now, word->timeout))
> return -EAGAIN;
> io_schedule_timeout(word->timeout - now);
> + if (signal_pending(current))
> + return -EINTR;
> return 0;
> }
> EXPORT_SYMBOL_GPL(bit_wait_io_timeout);
>

I run it overnight on top of 4.3 and didn't see stalls. So in case it helps

Tested-by: Vladimir Murzin <[email protected]>

Cheers
Vladimir

Subject: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

Commit-ID: 68985633bccb6066bf1803e316fbc6c1f5b796d6
Gitweb: http://git.kernel.org/tip/68985633bccb6066bf1803e316fbc6c1f5b796d6
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 1 Dec 2015 14:04:04 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:10:15 +0100

sched/wait: Fix signal handling in bit wait helpers

Vladimir reported getting RCU stall warnings and bisected it back to
commit:

743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")

That commit inadvertently reversed the calls to schedule() and signal_pending(),
thereby not handling the case where the signal receives while we sleep.

Reported-by: Vladimir Murzin <[email protected]>
Tested-by: Vladimir Murzin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/wait.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 052e026..f10bd87 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -583,18 +583,18 @@ EXPORT_SYMBOL(wake_up_atomic_t);

__sched int bit_wait(struct wait_bit_key *word)
{
- if (signal_pending_state(current->state, current))
- return 1;
schedule();
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait);

__sched int bit_wait_io(struct wait_bit_key *word)
{
- if (signal_pending_state(current->state, current))
- return 1;
io_schedule();
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait_io);
@@ -602,11 +602,11 @@ EXPORT_SYMBOL(bit_wait_io);
__sched int bit_wait_timeout(struct wait_bit_key *word)
{
unsigned long now = READ_ONCE(jiffies);
- if (signal_pending_state(current->state, current))
- return 1;
if (time_after_eq(now, word->timeout))
return -EAGAIN;
schedule_timeout(word->timeout - now);
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL_GPL(bit_wait_timeout);
@@ -614,11 +614,11 @@ EXPORT_SYMBOL_GPL(bit_wait_timeout);
__sched int bit_wait_io_timeout(struct wait_bit_key *word)
{
unsigned long now = READ_ONCE(jiffies);
- if (signal_pending_state(current->state, current))
- return 1;
if (time_after_eq(now, word->timeout))
return -EAGAIN;
io_schedule_timeout(word->timeout - now);
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}
EXPORT_SYMBOL_GPL(bit_wait_io_timeout);

2015-12-08 10:47:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Fri, Dec 04, 2015 at 03:52:12AM -0800, tip-bot for Peter Zijlstra wrote:
> Commit-ID: 68985633bccb6066bf1803e316fbc6c1f5b796d6
> Gitweb: http://git.kernel.org/tip/68985633bccb6066bf1803e316fbc6c1f5b796d6
> Author: Peter Zijlstra <[email protected]>
> AuthorDate: Tue, 1 Dec 2015 14:04:04 +0100
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Fri, 4 Dec 2015 10:10:15 +0100
>
> sched/wait: Fix signal handling in bit wait helpers
>
> Vladimir reported getting RCU stall warnings and bisected it back to
> commit:
>
> 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
>
> That commit inadvertently reversed the calls to schedule() and signal_pending(),
> thereby not handling the case where the signal receives while we sleep.
>
> Reported-by: Vladimir Murzin <[email protected]>
> Tested-by: Vladimir Murzin <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
> Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/sched/wait.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index 052e026..f10bd87 100644
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -583,18 +583,18 @@ EXPORT_SYMBOL(wake_up_atomic_t);
>
> __sched int bit_wait(struct wait_bit_key *word)
> {
> - if (signal_pending_state(current->state, current))
> - return 1;
> schedule();
> + if (signal_pending(current))
> + return -EINTR;
> return 0;
> }

*sigh*, so that patch was broken.. the below might fix it, but please
someone look at it, I seem to have a less than stellar track record
here...

---
Subject: sched/wait: Fix the signal handling fix

Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/

His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().

We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.

Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers")
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Vladimir Murzin <[email protected]>
Reported-by: Jan Stancek <[email protected]>
Tested-by: Jan Stancek <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
fs/cifs/inode.c | 6 +++---
fs/nfs/inode.c | 6 +++---
fs/nfs/internal.h | 2 +-
fs/nfs/pagelist.c | 2 +-
fs/nfs/pnfs.c | 4 ++--
include/linux/wait.h | 10 +++++-----
kernel/sched/wait.c | 20 ++++++++++----------
net/sunrpc/sched.c | 6 +++---
8 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 6b66dd5..a329f5b 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1831,11 +1831,11 @@ cifs_invalidate_mapping(struct inode *inode)
* @word: long word containing the bit lock
*/
static int
-cifs_wait_bit_killable(struct wait_bit_key *key)
+cifs_wait_bit_killable(struct wait_bit_key *key, int mode)
{
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
freezable_schedule_unsafe();
+ if (signal_pending_state(mode, current))
+ return -ERESTARTSYS;
return 0;
}

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 31b0a52..c7e8b87 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -75,11 +75,11 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
* nfs_wait_bit_killable - helper for functions that are sleeping on bit locks
* @word: long word containing the bit lock
*/
-int nfs_wait_bit_killable(struct wait_bit_key *key)
+int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
{
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
freezable_schedule_unsafe();
+ if (signal_pending_state(mode, current))
+ return -ERESTARTSYS;
return 0;
}
EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 56cfde2..9dea85f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -379,7 +379,7 @@ extern int nfs_drop_inode(struct inode *);
extern void nfs_clear_inode(struct inode *);
extern void nfs_evict_inode(struct inode *);
void nfs_zap_acl_cache(struct inode *inode);
-extern int nfs_wait_bit_killable(struct wait_bit_key *key);
+extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);

/* super.c */
extern const struct super_operations nfs_sops;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index fe3ddd2..452a011 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -129,7 +129,7 @@ __nfs_iocounter_wait(struct nfs_io_counter *c)
set_bit(NFS_IO_INPROGRESS, &c->flags);
if (atomic_read(&c->io_count) == 0)
break;
- ret = nfs_wait_bit_killable(&q.key);
+ ret = nfs_wait_bit_killable(&q.key, TASK_KILLABLE);
} while (atomic_read(&c->io_count) != 0 && !ret);
finish_wait(wq, &q.wait);
return ret;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5a8ae21..bec0384 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1466,11 +1466,11 @@ static bool pnfs_within_mdsthreshold(struct nfs_open_context *ctx,
}

/* stop waiting if someone clears NFS_LAYOUT_RETRY_LAYOUTGET bit. */
-static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key)
+static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key, int mode)
{
if (!test_bit(NFS_LAYOUT_RETRY_LAYOUTGET, key->flags))
return 1;
- return nfs_wait_bit_killable(key);
+ return nfs_wait_bit_killable(key, mode);
}

static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 1e1bf9f..513b36f 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -145,7 +145,7 @@ __remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)
list_del(&old->task_list);
}

-typedef int wait_bit_action_f(struct wait_bit_key *);
+typedef int wait_bit_action_f(struct wait_bit_key *, int mode);
void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key);
void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
@@ -960,10 +960,10 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
} while (0)


-extern int bit_wait(struct wait_bit_key *);
-extern int bit_wait_io(struct wait_bit_key *);
-extern int bit_wait_timeout(struct wait_bit_key *);
-extern int bit_wait_io_timeout(struct wait_bit_key *);
+extern int bit_wait(struct wait_bit_key *, int);
+extern int bit_wait_io(struct wait_bit_key *, int);
+extern int bit_wait_timeout(struct wait_bit_key *, int);
+extern int bit_wait_io_timeout(struct wait_bit_key *, int);

/**
* wait_on_bit - wait for a bit to be cleared
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index f10bd87..f15d6b6 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -392,7 +392,7 @@ __wait_on_bit(wait_queue_head_t *wq, struct wait_bit_queue *q,
do {
prepare_to_wait(wq, &q->wait, mode);
if (test_bit(q->key.bit_nr, q->key.flags))
- ret = (*action)(&q->key);
+ ret = (*action)(&q->key, mode);
} while (test_bit(q->key.bit_nr, q->key.flags) && !ret);
finish_wait(wq, &q->wait);
return ret;
@@ -431,7 +431,7 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
prepare_to_wait_exclusive(wq, &q->wait, mode);
if (!test_bit(q->key.bit_nr, q->key.flags))
continue;
- ret = action(&q->key);
+ ret = action(&q->key, mode);
if (!ret)
continue;
abort_exclusive_wait(wq, &q->wait, mode, &q->key);
@@ -581,43 +581,43 @@ void wake_up_atomic_t(atomic_t *p)
}
EXPORT_SYMBOL(wake_up_atomic_t);

-__sched int bit_wait(struct wait_bit_key *word)
+__sched int bit_wait(struct wait_bit_key *word, int mode)
{
schedule();
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait);

-__sched int bit_wait_io(struct wait_bit_key *word)
+__sched int bit_wait_io(struct wait_bit_key *word, int mode)
{
io_schedule();
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait_io);

-__sched int bit_wait_timeout(struct wait_bit_key *word)
+__sched int bit_wait_timeout(struct wait_bit_key *word, int mode)
{
unsigned long now = READ_ONCE(jiffies);
if (time_after_eq(now, word->timeout))
return -EAGAIN;
schedule_timeout(word->timeout - now);
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
EXPORT_SYMBOL_GPL(bit_wait_timeout);

-__sched int bit_wait_io_timeout(struct wait_bit_key *word)
+__sched int bit_wait_io_timeout(struct wait_bit_key *word, int mode)
{
unsigned long now = READ_ONCE(jiffies);
if (time_after_eq(now, word->timeout))
return -EAGAIN;
io_schedule_timeout(word->timeout - now);
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index f14f24e..73ad57a 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -250,11 +250,11 @@ void rpc_destroy_wait_queue(struct rpc_wait_queue *queue)
}
EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);

-static int rpc_wait_bit_killable(struct wait_bit_key *key)
+static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
{
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
freezable_schedule_unsafe();
+ if (signal_pending_state(mode, current))
+ return -ERESTARTSYS;
return 0;
}

2015-12-09 01:06:45

by NeilBrown

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Tue, Dec 08 2015, Peter Zijlstra wrote:

>>
>
> *sigh*, so that patch was broken.. the below might fix it, but please
> someone look at it, I seem to have a less than stellar track record
> here...

This new change seems to be more intrusive than should be needed.
Can't we just do:


__sched int bit_wait(struct wait_bit_key *word)
{
+ long state = current->state;
- if (signal_pending_state(current->state, current))
- return 1;
schedule();
+ if (signal_pending_state(state, current))
+ return -EINTR;
return 0;
}

??

(and sorry for breaking this in the first place!)

NeilBrown


Attachments:
signature.asc (818.00 B)

2015-12-09 07:40:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Wed, Dec 09, 2015 at 12:06:33PM +1100, NeilBrown wrote:
> On Tue, Dec 08 2015, Peter Zijlstra wrote:
>
> >>
> >
> > *sigh*, so that patch was broken.. the below might fix it, but please
> > someone look at it, I seem to have a less than stellar track record
> > here...
>
> This new change seems to be more intrusive than should be needed.
> Can't we just do:
>
>
> __sched int bit_wait(struct wait_bit_key *word)
> {
> + long state = current->state;

No, current->state can already be changed by this time.

2015-12-09 21:30:23

by NeilBrown

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Wed, Dec 09 2015, Peter Zijlstra wrote:

> On Wed, Dec 09, 2015 at 12:06:33PM +1100, NeilBrown wrote:
>> On Tue, Dec 08 2015, Peter Zijlstra wrote:
>>
>> >>
>> >
>> > *sigh*, so that patch was broken.. the below might fix it, but please
>> > someone look at it, I seem to have a less than stellar track record
>> > here...
>>
>> This new change seems to be more intrusive than should be needed.
>> Can't we just do:
>>
>>
>> __sched int bit_wait(struct wait_bit_key *word)
>> {
>> + long state = current->state;
>
> No, current->state can already be changed by this time.

Does that matter?
It can only have changed to TASK_RUNNING - right?
In that case signal_pending_state() will return 0 and the bit_wait() acts
as though the thread was woken up normally (which it was) rather than by
a signal (which maybe it was too, but maybe that happened just a tiny
bit later).

As long as signal delivery doesn't change ->state, we should be safe.
We should even be safe testing ->state *after* the call the schedule().

NeilBrown


Attachments:
signature.asc (818.00 B)

2015-12-10 13:09:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Thu, Dec 10, 2015 at 08:30:01AM +1100, NeilBrown wrote:
> On Wed, Dec 09 2015, Peter Zijlstra wrote:
>
> > On Wed, Dec 09, 2015 at 12:06:33PM +1100, NeilBrown wrote:
> >> On Tue, Dec 08 2015, Peter Zijlstra wrote:
> >>
> >> >>
> >> >
> >> > *sigh*, so that patch was broken.. the below might fix it, but please
> >> > someone look at it, I seem to have a less than stellar track record
> >> > here...
> >>
> >> This new change seems to be more intrusive than should be needed.
> >> Can't we just do:
> >>
> >>
> >> __sched int bit_wait(struct wait_bit_key *word)
> >> {
> >> + long state = current->state;
> >
> > No, current->state can already be changed by this time.
>
> Does that matter?
> It can only have changed to TASK_RUNNING - right?
> In that case signal_pending_state() will return 0 and the bit_wait() acts
> as though the thread was woken up normally (which it was) rather than by
> a signal (which maybe it was too, but maybe that happened just a tiny
> bit later).
>
> As long as signal delivery doesn't change ->state, we should be safe.
> We should even be safe testing ->state *after* the call the schedule().

Blergh, all I've managed to far is to confuse myself further. Even
something like the original (+- the EINTR) should work when we consider
the looping, even when mixed with an occasional spurious wakeup.


int bit_wait()
{
if (signal_pending_state(current->state, current))
return -EINTR;
schedule();
}


This can go wrong against raising a signal thusly:

prepare_to_wait()
1: if (signal_pending_state(current->state, current))
// false, nothing pending
schedule();
set_tsk_thread_flag(t, TIF_SIGPENDING);

<spurious wakeup>

prepare_to_wait()
wake_up_state(t, ...);
2: if (signal_pending_state(current->state, current))
// false, TASK_RUNNING

schedule(); // doesn't block because pending

prepare_to_wait()
3: if (signal_pending_state(current->state, current))
// true, pending

2015-12-11 11:31:10

by Paul Turner

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Thu, Dec 10, 2015 at 5:09 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Dec 10, 2015 at 08:30:01AM +1100, NeilBrown wrote:
>> On Wed, Dec 09 2015, Peter Zijlstra wrote:
>>
>> > On Wed, Dec 09, 2015 at 12:06:33PM +1100, NeilBrown wrote:
>> >> On Tue, Dec 08 2015, Peter Zijlstra wrote:
>> >>
>> >> >>
>> >> >
>> >> > *sigh*, so that patch was broken.. the below might fix it, but please
>> >> > someone look at it, I seem to have a less than stellar track record
>> >> > here...
>> >>
>> >> This new change seems to be more intrusive than should be needed.
>> >> Can't we just do:
>> >>
>> >>
>> >> __sched int bit_wait(struct wait_bit_key *word)
>> >> {
>> >> + long state = current->state;
>> >
>> > No, current->state can already be changed by this time.
>>
>> Does that matter?
>> It can only have changed to TASK_RUNNING - right?
>> In that case signal_pending_state() will return 0 and the bit_wait() acts
>> as though the thread was woken up normally (which it was) rather than by
>> a signal (which maybe it was too, but maybe that happened just a tiny
>> bit later).
>>
>> As long as signal delivery doesn't change ->state, we should be safe.
>> We should even be safe testing ->state *after* the call the schedule().
>
> Blergh, all I've managed to far is to confuse myself further. Even
> something like the original (+- the EINTR) should work when we consider
> the looping, even when mixed with an occasional spurious wakeup.
>
>
> int bit_wait()
> {
> if (signal_pending_state(current->state, current))
> return -EINTR;
> schedule();
> }
>
>
> This can go wrong against raising a signal thusly:
>
> prepare_to_wait()
> 1: if (signal_pending_state(current->state, current))
> // false, nothing pending
> schedule();
> set_tsk_thread_flag(t, TIF_SIGPENDING);
>
> <spurious wakeup>
>
> prepare_to_wait()
> wake_up_state(t, ...);
> 2: if (signal_pending_state(current->state, current))
> // false, TASK_RUNNING
>
> schedule(); // doesn't block because pending

Note that a quick inspection does not turn up _any_ TASK_INTERRUPTIBLE
callers. When this previously occurred, it could likely only be with
a fatal signal, which would have hidden these sins.

>
> prepare_to_wait()
> 3: if (signal_pending_state(current->state, current))
> // true, pending
>

Hugh asked me about this after seeing a crash, here's another exciting
way in which the current code breaks -- this one actually quite
serious:

Consider __lock_page:

void __lock_page(struct page *page)
{
DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
__wait_on_bit_lock(page_waitqueue(page), &wait, bit_wait_io,
TASK_UNINTERRUPTIBLE);
}

With the current state of the world,

__sched int bit_wait_io(struct wait_bit_key *word)
{
- if (signal_pending_state(current->state, current))
- return 1;
io_schedule();
+ if (signal_pending(current))
+ return -EINTR;
return 0;
}

Called from __wait_on_bit_lock.

Previously, signal_pending_state() was checked under
TASK_UNINTERRUPTIBLE (via prepare_to_wait_exclusive). Now, we simply
check for the presence of any signal -- after we have returned to
running state, e.g. post io_schedule() when somebody has kicked the
wait-queue.

However, this now means that _wait_on_bit_lock can return -EINTR up to
__lock_page; which does not validate the return code and blindly
returns. This looks to have been a previously existing bug, but it
was at least masked by the fact that it required a fatal signal
previously (and that the page we return unlocked is likely going to be
freed from the dying process anyway).

Peter's proposed follow-up above looks strictly more correct. We need
to evaluate the potential existence of a signal, *after* we return
from schedule, but in the context of the state which we previously
_entered_ schedule() on.

Reviewed-by: Paul Turner <[email protected]>


>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-12-11 11:40:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Fri, Dec 11, 2015 at 03:30:33AM -0800, Paul Turner wrote:

> > Blergh, all I've managed to far is to confuse myself further. Even
> > something like the original (+- the EINTR) should work when we consider
> > the looping, even when mixed with an occasional spurious wakeup.
> >
> >
> > int bit_wait()
> > {
> > if (signal_pending_state(current->state, current))
> > return -EINTR;
> > schedule();
> > }

So I asked Vladimir to test that (simply changing the return from 1 to
-EINTR) and it made his fail much less likely but it still failed in the
same way.

So I'm fairly sure I'm still missing something :/

> Hugh asked me about this after seeing a crash, here's another exciting
> way in which the current code breaks -- this one actually quite
> serious:

Yep, this got reported by Jan and I did kick myself for that.

> Peter's proposed follow-up above looks strictly more correct. We need
> to evaluate the potential existence of a signal, *after* we return
> from schedule, but in the context of the state which we previously
> _entered_ schedule() on.
>
> Reviewed-by: Paul Turner <[email protected]>

Right, its maybe a bit overkill, but at this point I'm a tad
conservative/paranoid.

Vladimir, Jan could you both please that patch?

lkml.kernel.org/r/[email protected]


Thanks!

2015-12-11 11:54:05

by Vladimir Murzin

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On 11/12/15 11:39, Peter Zijlstra wrote:
> On Fri, Dec 11, 2015 at 03:30:33AM -0800, Paul Turner wrote:
>
>>> Blergh, all I've managed to far is to confuse myself further. Even
>>> something like the original (+- the EINTR) should work when we consider
>>> the looping, even when mixed with an occasional spurious wakeup.
>>>
>>>
>>> int bit_wait()
>>> {
>>> if (signal_pending_state(current->state, current))
>>> return -EINTR;
>>> schedule();
>>> }
>
> So I asked Vladimir to test that (simply changing the return from 1 to
> -EINTR) and it made his fail much less likely but it still failed in the
> same way.
>
> So I'm fairly sure I'm still missing something :/
>
>> Hugh asked me about this after seeing a crash, here's another exciting
>> way in which the current code breaks -- this one actually quite
>> serious:
>
> Yep, this got reported by Jan and I did kick myself for that.
>
>> Peter's proposed follow-up above looks strictly more correct. We need
>> to evaluate the potential existence of a signal, *after* we return
>> from schedule, but in the context of the state which we previously
>> _entered_ schedule() on.
>>
>> Reviewed-by: Paul Turner <[email protected]>
>
> Right, its maybe a bit overkill, but at this point I'm a tad
> conservative/paranoid.
>
> Vladimir, Jan could you both please that patch?
>
> lkml.kernel.org/r/[email protected]

Already in a queue!

Cheers
Vladimir

>
>
> Thanks!
>
>

2015-12-11 13:09:24

by Jan Stancek

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers





----- Original Message -----
> From: "Peter Zijlstra" <[email protected]>
> To: "Paul Turner" <[email protected]>
> Cc: "NeilBrown" <[email protected]>, "Linus Torvalds" <[email protected]>, "Thomas Gleixner"
> <[email protected]>, "LKML" <[email protected]>, "Mike Galbraith" <[email protected]>, "Ingo Molnar"
> <[email protected]>, "Peter Anvin" <[email protected]>, "vladimir murzin" <[email protected]>,
> [email protected], [email protected], "Oleg Nesterov" <[email protected]>
> Sent: Friday, 11 December, 2015 12:39:59 PM
> Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers
>
> On Fri, Dec 11, 2015 at 03:30:33AM -0800, Paul Turner wrote:
>
> > > Blergh, all I've managed to far is to confuse myself further. Even
> > > something like the original (+- the EINTR) should work when we consider
> > > the looping, even when mixed with an occasional spurious wakeup.
> > >
> > >
> > > int bit_wait()
> > > {
> > > if (signal_pending_state(current->state, current))
> > > return -EINTR;
> > > schedule();
> > > }
>
> So I asked Vladimir to test that (simply changing the return from 1 to
> -EINTR) and it made his fail much less likely but it still failed in the
> same way.
>
> So I'm fairly sure I'm still missing something :/
>
> > Hugh asked me about this after seeing a crash, here's another exciting
> > way in which the current code breaks -- this one actually quite
> > serious:
>
> Yep, this got reported by Jan and I did kick myself for that.
>
> > Peter's proposed follow-up above looks strictly more correct. We need
> > to evaluate the potential existence of a signal, *after* we return
> > from schedule, but in the context of the state which we previously
> > _entered_ schedule() on.
> >
> > Reviewed-by: Paul Turner <[email protected]>
>
> Right, its maybe a bit overkill, but at this point I'm a tad
> conservative/paranoid.
>
> Vladimir, Jan could you both please that patch?
>
> lkml.kernel.org/r/[email protected]

This appears to exactly match patch I tested against v4.4-rc4 here:
http://marc.info/?l=linux-mm&m=144950957622869&w=2

Anyway, I repeated the test with v4.4-rc4-113-g0bd0f1e as base.
Results look good. With patch applied, I can't trigger
"kernel BUG at mm/filemap.c:238!" anymore.

Regards,
Jan

>
>
> Thanks!
>
>
>

2015-12-11 13:22:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On Fri, Dec 11, 2015 at 08:08:36AM -0500, Jan Stancek wrote:
> > lkml.kernel.org/r/[email protected]
>
> This appears to exactly match patch I tested against v4.4-rc4 here:
> http://marc.info/?l=linux-mm&m=144950957622869&w=2

Ah, I forgot I posted the thing twice and you already tested it :/

> Anyway, I repeated the test with v4.4-rc4-113-g0bd0f1e as base.
> Results look good. With patch applied, I can't trigger
> "kernel BUG at mm/filemap.c:238!" anymore.

Thanks!

2015-12-11 17:57:15

by Vladimir Murzin

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On 11/12/15 11:39, Peter Zijlstra wrote:
> On Fri, Dec 11, 2015 at 03:30:33AM -0800, Paul Turner wrote:
>
>>> Blergh, all I've managed to far is to confuse myself further. Even
>>> something like the original (+- the EINTR) should work when we consider
>>> the looping, even when mixed with an occasional spurious wakeup.
>>>
>>>
>>> int bit_wait()
>>> {
>>> if (signal_pending_state(current->state, current))
>>> return -EINTR;
>>> schedule();
>>> }
>
> So I asked Vladimir to test that (simply changing the return from 1 to
> -EINTR) and it made his fail much less likely but it still failed in the
> same way.
>
> So I'm fairly sure I'm still missing something :/
>
>> Hugh asked me about this after seeing a crash, here's another exciting
>> way in which the current code breaks -- this one actually quite
>> serious:
>
> Yep, this got reported by Jan and I did kick myself for that.
>
>> Peter's proposed follow-up above looks strictly more correct. We need
>> to evaluate the potential existence of a signal, *after* we return
>> from schedule, but in the context of the state which we previously
>> _entered_ schedule() on.
>>
>> Reviewed-by: Paul Turner <[email protected]>
>
> Right, its maybe a bit overkill, but at this point I'm a tad
> conservative/paranoid.
>
> Vladimir, Jan could you both please that patch?
>
> lkml.kernel.org/r/[email protected]
>

By this time my test has been run ~500 times without any stalls. I'll
keep running overnight (just in case), but I think that patch can be
marked as tested.

Cheers
Vladimir

>
> Thanks!
>
>

2015-12-15 16:56:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [BISECTED] rcu_sched self-detected stall since 3.17

Sorry again for the huge delay.

And all I can say is that I am all confused.

On 12/01, Peter Zijlstra wrote:
>
> On Fri, Nov 20, 2015 at 03:35:38PM +0000, Vladimir Murzin wrote:
> > commit 743162013d40ca612b4cb53d3a200dff2d9ab26e
> > Author: NeilBrown <[email protected]>
> > Date: Mon Jul 7 15:16:04 2014 +1000

That patch still looks correct to me.

> > and if I apply following diff I don't see stalls anymore.
> >
> > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> > index a104879..2d68cdb 100644
> > --- a/kernel/sched/wait.c
> > +++ b/kernel/sched/wait.c
> > @@ -514,9 +514,10 @@ EXPORT_SYMBOL(bit_wait);
> >
> > __sched int bit_wait_io(void *word)
> > {
> > + io_schedule();
> > +
> > if (signal_pending_state(current->state, current))
> > return 1;
> > - io_schedule();
> > return 0;
> > }
> > EXPORT_SYMBOL(bit_wait_io);

I can't understand why this change helps. But note that it actually removes
the signal_pending_state() check from bit_wait_io(), current->state is always
TASK_RUNNING after return from schedule(), signal_pending_state() will always
return zero.

This means that after this change wait_on_page_bit_killable() will spin in a
busy-wait loop if the caller is killed.

> The reason this is broken is that schedule() will no-op when there is a
> pending signal, while raising a signal will also issue a wakeup.

But why this is wrong? We should notice signal_pending_state() on the next
iteration.

> Thus the right thing to do is check for the signal state after,

I think this check should work on both sides. The only difference is that
you obviously can't use current->state after schedule().

I still can't understand the problem.

Oleg.

2015-12-15 18:16:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On 12/11, Peter Zijlstra wrote:
>
> On Fri, Dec 11, 2015 at 03:30:33AM -0800, Paul Turner wrote:
>
> > > Blergh, all I've managed to far is to confuse myself further. Even
> > > something like the original (+- the EINTR) should work when we consider
> > > the looping, even when mixed with an occasional spurious wakeup.
> > >
> > >
> > > int bit_wait()
> > > {
> > > if (signal_pending_state(current->state, current))
> > > return -EINTR;
> > > schedule();
> > > }
>
> So I asked Vladimir to test that (simply changing the return from 1 to
> -EINTR) and it made his fail much less likely but it still failed in the
> same way.
>
> So I'm fairly sure I'm still missing something :/

Same here...

Yes, "return 1" in bit_wait_io() doesn't look right. For example
do_generic_file_read() can wrongly return if lock_page_killable() returns
this error code. But I fail to understand how this can read to rcu-stall.

Oleg.

2015-12-15 19:01:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/wait: Fix signal handling in bit wait helpers

On 12/11, Paul Turner wrote:
>
> Peter's proposed follow-up above looks strictly more correct. We need
> to evaluate the potential existence of a signal, *after* we return
> from schedule,

I still don't understand this...

signal_pending_check(current->state) before schedule() should be fine
even if it actually reads current->state twice and it races with wakeup/
signal_wake_up() which can change the caller's state.

> but in the context of the state which we previously
> _entered_ schedule() on.

Yes, but only if we do this after return from schedule().


But somehow this change helps. It adds the subtle difference(s), for example
__wait_on_bit_lock() won't do another test_and_set_bit() if the sleeping
caller is killed, but this shouldn't matter.

And if this does matter because it has a buggy user, then it is not clear why
the change from Vladimir helps too.

The common part is that both changes make "return 1" impossible, but according
to another email from Peter this just makes the fail less likely.

I am really puzzled.

Oleg.