2021-09-29 16:11:41

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()

Simplify and make wake_up_if_idle() more robust, also don't iterate
the whole machine with preempt_disable() in it's caller:
wake_up_all_idle_cpus().

This prepares for another wake_up_if_idle() user that needs a full
do_idle() cycle.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/core.c | 14 +++++---------
kernel/smp.c | 6 +++---
2 files changed, 8 insertions(+), 12 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3691,15 +3691,11 @@ void wake_up_if_idle(int cpu)
if (!is_idle_task(rcu_dereference(rq->curr)))
goto out;

- if (set_nr_if_polling(rq->idle)) {
- trace_sched_wake_idle_without_ipi(cpu);
- } else {
- rq_lock_irqsave(rq, &rf);
- if (is_idle_task(rq->curr))
- smp_send_reschedule(cpu);
- /* Else CPU is not idle, do nothing here: */
- rq_unlock_irqrestore(rq, &rf);
- }
+ rq_lock_irqsave(rq, &rf);
+ if (is_idle_task(rq->curr))
+ resched_curr(rq);
+ /* Else CPU is not idle, do nothing here: */
+ rq_unlock_irqrestore(rq, &rf);

out:
rcu_read_unlock();
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
{
int cpu;

- preempt_disable();
+ cpus_read_lock();
for_each_online_cpu(cpu) {
- if (cpu == smp_processor_id())
+ if (cpu == raw_smp_processor_id())
continue;

wake_up_if_idle(cpu);
}
- preempt_enable();
+ cpus_read_unlock();
}
EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);




2021-10-09 10:10:57

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: sched/core] sched: Simplify wake_up_*idle*()

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 8850cb663b5cda04d33f9cfbc38889d73d3c8e24
Gitweb: https://git.kernel.org/tip/8850cb663b5cda04d33f9cfbc38889d73d3c8e24
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 21 Sep 2021 22:16:02 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 07 Oct 2021 13:51:15 +02:00

sched: Simplify wake_up_*idle*()

Simplify and make wake_up_if_idle() more robust, also don't iterate
the whole machine with preempt_disable() in it's caller:
wake_up_all_idle_cpus().

This prepares for another wake_up_if_idle() user that needs a full
do_idle() cycle.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Vasily Gorbik <[email protected]>
Tested-by: Vasily Gorbik <[email protected]> # on s390
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/core.c | 14 +++++---------
kernel/smp.c | 6 +++---
2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 74db3c3..3b55ef9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3695,15 +3695,11 @@ void wake_up_if_idle(int cpu)
if (!is_idle_task(rcu_dereference(rq->curr)))
goto out;

- if (set_nr_if_polling(rq->idle)) {
- trace_sched_wake_idle_without_ipi(cpu);
- } else {
- rq_lock_irqsave(rq, &rf);
- if (is_idle_task(rq->curr))
- smp_send_reschedule(cpu);
- /* Else CPU is not idle, do nothing here: */
- rq_unlock_irqrestore(rq, &rf);
- }
+ rq_lock_irqsave(rq, &rf);
+ if (is_idle_task(rq->curr))
+ resched_curr(rq);
+ /* Else CPU is not idle, do nothing here: */
+ rq_unlock_irqrestore(rq, &rf);

out:
rcu_read_unlock();
diff --git a/kernel/smp.c b/kernel/smp.c
index f43ede0..ad0b68a 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
{
int cpu;

- preempt_disable();
+ cpus_read_lock();
for_each_online_cpu(cpu) {
- if (cpu == smp_processor_id())
+ if (cpu == raw_smp_processor_id())
continue;

wake_up_if_idle(cpu);
}
- preempt_enable();
+ cpus_read_unlock();
}
EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);

2021-10-13 14:37:28

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()



On 9/29/2021 11:17 AM, Peter Zijlstra wrote:
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
> {
> int cpu;
>
> - preempt_disable();
> + cpus_read_lock();
> for_each_online_cpu(cpu) {
> - if (cpu == smp_processor_id())
> + if (cpu == raw_smp_processor_id())
> continue;
>
> wake_up_if_idle(cpu);
> }
> - preempt_enable();
> + cpus_read_unlock();

Peter, it looks like this thing introduced a deadlock during CPU online/offline.

[ 630.145166][ T129] WARNING: possible recursive locking detected
[ 630.151164][ T129] 5.15.0-rc5-next-20211013+ #145 Not tainted
[ 630.156988][ T129] --------------------------------------------
[ 630.162984][ T129] cpuhp/21/129 is trying to acquire lock:
[ 630.168547][ T129] ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: wake_up_all_idle_cpus+0x40/0xe8
wake_up_all_idle_cpus at /usr/src/linux-next/kernel/smp.c:1174
[ 630.178040][ T129]
[ 630.178040][){++++}-{0:0}, at help us debug this:
[ 630.202292][ T129] Possible unsafe locking scenario:
[ 630.202292][ T129]
[ 630.209590][ T129] CPU0
[ 630.212720][ T129] ----
[ 630.215851][ T129] lock(cpu_hotplug_lock);
[ 630.220202][ T129] lock(cpu_hotplug_lock);
[ 630.224553][ T129]
[ 630.224553][ T129] *** DEADLOCK ***
[ 630.224553][ T129]
[ 630.232545][ T129] May be due to missing lock nesting notation
[ 630.232545][ T129]
[ 630.240711][ T129] 3 locks held by cpuhp/21/129:
[ 630.245406][ T129] #0: ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
[ 630.254976][ T129] #1: ffff800011f46780 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
[ 630.264372][ T129] #2: ffff8000191fb9c8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x24/0x38
[ 630.274031][ T129]
[ 630.274031][ T129] stack backtrace:
[ 630.279767][ T129] CPU: 21 PID: 129 Comm: cpuhp/21 Not tainted 5.15.0-rc5-next-20211013+ #145
[ 630.288371][ T129] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
[ 630.296886][ T129] Call trace:
[ 630.300017][ T129] dump_backtrace+0x0/0x3b8
[ 630.304369][ T129] show_stack+0x20/0x30
[ 630.308371][ T129] dump_stack_lvl+0x8c/0xb8
[ 630.312722][ T129] dump_stack+0x1c/0x38
[ 630.316723][ T129] validate_chain+0x1d84/0x1da0
[ 630.321421][ T129] __lock_acquire+0xab0/0x2040
[ 630.326033][ T129] lock_acquire+0x32c/0xb08
[ 630.330390][ T129] cpus_read_lock+0x94/0x308
[ 630.334827][ T129] wake_up_all_idle_cpus+0x40/0xe8
[ 630.339784][ T129] cpuidle_uninstall_idle_handler+0x3c/0x50
[ 630.345524][ T129] cpuidle_pause_and_lock+0x28/0x38
[ 630.350569][ T129] acpi_processor_hotplug+0xc0/0x170
[ 630.355701][ T129] acpi_soft_cpu_online+0x124/0x250
[ 630.360745][ T129] cpuhp_invoke_callback+0x51c/0x2ab8
[ 630.365963][ T129] cpuhp_thread_fun+0x204/0x588
[ 630.370659][ T129] smpboot_thread_fn+0x3f0/0xc40
[ 630.375444][ T129] kthread+0x3d8/0x488
[ 630.379360][ T129] ret_from_fork+0x10/0x20
[ 863.525716][ T191] INFO: task cpuhp/21:129 blocked for more than 122 seconds.
[ 863.532954][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
[ 863.539361][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 863.547927][ T191] task:cpuhp/21 state:D stack:59104 pid: 129 ppid: 2 flags:0x00000008
[ 863.557029][ T191] Call trace:
[ 863.560171][ T191] __switch_to+0x184/0x400
[ 863.564448][ T191] __schedule+0x74c/0x1940
[ 863.568753][ T191] schedule+0x110/0x318
[ 863.572764][ T191] percpu_rwsem_wait+0x1b8/0x348
[ 863.577592][ T191] __percpu_down_read+0xb0/0x148
[ 863.582386][ T191] cpus_read_lock+0x2b0/0x308
[ 863.586961][ T191] wake_up_all_idle_cpus+0x40/0xe8
[ 863.591931][ T191] cpuidle_uninstall_idle_handler+0x3c/0x50
[ 863.597716][ T191] cpuidle_pause_and_lock+0x28/0x38
[ 863.602771][ T191] acpi_processor_hotplug+0xc0/0x170
[ 863.607946][ T191] acpi_soft_cpu_online+0x124/0x250
[ 863.613001][ T191] cpuhp_invoke_callback+0x51c/0x2ab8
[ 863.618261][ T191] cpuhp_thread_fun+0x204/0x588
[ 863.622967][ T191] smpboot_thread_fn+0x3f0/0xc40
[ 863.627787][ T191] kthread+0x3d8/0x488
[ 863.631712][ T191] ret_from_fork+0x10/0x20
[ 863.636020][ T191] INFO: task kworker/0:2:189 blocked for more than 122 seconds.
[ 863.643500][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
[ 863.649882][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 863.658425][ T191] task:kworker/0:2 state:D stack:58368 pid: 189 ppid: 2 flags:0x00000008
[ 863.667516][ T191] Workqueue: events vmstat_shepherd
[ 863.672573][ T191] Call trace:
[ 863.675731][ T191] __switch_to+0x184/0x400
[ 863.680001][ T191] __schedule+0x74c/0x1940
[ 863.684268][ T191] schedule+0x110/0x318
[ 863.688295][ T191] percpu_rwsem_wait+0x1b8/0x348
[ 863.693085][ T191] __percpu_down_read+0xb0/0x148
[ 863.697892][ T191] cpus_read_lock+0x2b0/0x308
[ 863.702421][ T191] vmstat_shepherd+0x5c/0x1a8
[ 863.706977][ T191] process_one_work+0x808/0x19d0
[ 863.711767][ T191] worker_thread+0x334/0xae0
[ 863.716227][ T191] kthread+0x3d8/0x488
[ 863.720149][ T191] ret_from_fork+0x10/0x20
[ 863.724487][ T191] INFO: task lsbug:4642 blocked for more than 123 seconds.
[ 863.731565][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
[ 863.737938][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 863.746490][ T191] task:lsbug state:D stack:55536 pid: 4642 ppid: 4638 flags:0x00000008
[ 863.755549][ T191] Call trace:
[ 863.758712][ T191] __switch_to+0x184/0x400
[ 863.762984][ T191] __schedule+0x74c/0x1940
[ 863.767286][ T191] schedule+0x110/0x318
[ 863.771294][ T191] schedule_timeout+0x188/0x238
[ 863.776016][ T191] wait_for_completion+0x174/0x290
[ 863.780979][ T191] __cpuhp_kick_ap+0x158/0x1a8
[ 863.785592][ T191] cpuhp_kick_ap+0x1f0/0x828
[ 863.790053][ T191] bringup_cpu+0x180/0x1e0
[ 863.794320][ T191] cpuhp_invoke_callback+0x51c/0x2ab8
[ 863.799561][ T191] cpuhp_invoke_callback_range+0xa4/0x108
[ 863.805130][ T191] cpu_up+0x528/0xd78
[ 863.808982][ T191] cpu_device_up+0x4c/0x68
[ 863.813249][ T191] cpu_subsys_online+0xc0/0x1f8
[ 863.817972][ T191] device_online+0x10c/0x180
[ 863.822413][ T191] online_store+0x10c/0x118
[ 863.826791][ T191] dev_attr_store+0x44/0x78
[ 863.831148][ T191] sysfs_kf_write+0xe8/0x138
[ 863.835590][ T191] kernfs_fop_write_iter+0x26c/0x3d0
[ 863.840745][ T191] new_sync_write+0x2bc/0x4f8
[ 863.845275][ T191] vfs_write+0x714/0xcd8
[ 863.849387][ T191] ksys_write+0xf8/0x1e0
[ 863.853481][ T191] __arm64_sys_write+0x74/0xa8
[ 863.858113][ T191] invoke_syscall.constprop.0+0xdc/0x1d8
[ 863.863597][ T191] do_el0_svc+0xe4/0x298
[ 863.867710][ T191] el0_svc+0x64/0x130
[ 863.871545][ T191] el0t_64_sync_handler+0xb0/0xb8
[ 863.876437][ T191] el0t_64_sync+0x180/0x184
[ 863.880798][ T191] INFO: task mount:4682 blocked for more than 123 seconds.
[ 863.887881][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
[ 863.894232][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 863.902776][ T191] task:mount state:D stack:55856 pid: 4682 ppid: 1101 flags:0x00000000
[ 863.911865][ T191] Call trace:
[ 863.915003][ T191] __switch_to+0x184/0x400
[ 863.919296][ T191] __schedule+0x74c/0x1940
[ 863.923564][ T191] schedule+0x110/0x318
[ 863.927590][ T191] percpu_rwsem_wait+0x1b8/0x348
[ 863.932380][ T191] __percpu_down_read+0xb0/0x148
[ 863.937187][ T191] cpus_read_lock+0x2b0/0x308
[ 863.941715][ T191] alloc_workqueue+0x730/0xd48
[ 863.946357][ T191] loop_configure+0x2d4/0x1180 [loop]
[ 863.951592][ T191] lo_ioctl+0x5dc/0x1228 [loop]
[ 863.956321][ T191] blkdev_ioctl+0x258/0x820
[ 863.960678][ T191] __arm64_sys_ioctl+0x114/0x180
[ 863.965468][ T191] invoke_syscall.constprop.0+0xdc/0x1d8
[ 863.970974][ T191] do_el0_svc+0xe4/0x298
[ 863.975069][ T191] el0_svc+0x64/0x130
[ 863.978922][ T191] el0t_64_sync_handler+0xb0/0xb8
[ 863.983798][ T191] el0t_64_sync+0x180/0x184
[ 863.988172][ T191] INFO: lockdep is turned off.

2021-10-19 03:50:07

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()

Peter, any thoughts? I did confirm that reverting the commit fixed the issue.

On 10/13/2021 10:32 AM, Qian Cai wrote:
>
>
> On 9/29/2021 11:17 AM, Peter Zijlstra wrote:
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
>> {
>> int cpu;
>>
>> - preempt_disable();
>> + cpus_read_lock();
>> for_each_online_cpu(cpu) {
>> - if (cpu == smp_processor_id())
>> + if (cpu == raw_smp_processor_id())
>> continue;
>>
>> wake_up_if_idle(cpu);
>> }
>> - preempt_enable();
>> + cpus_read_unlock();
>
> Peter, it looks like this thing introduced a deadlock during CPU online/offline.
>
> [ 630.145166][ T129] WARNING: possible recursive locking detected
> [ 630.151164][ T129] 5.15.0-rc5-next-20211013+ #145 Not tainted
> [ 630.156988][ T129] --------------------------------------------
> [ 630.162984][ T129] cpuhp/21/129 is trying to acquire lock:
> [ 630.168547][ T129] ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: wake_up_all_idle_cpus+0x40/0xe8
> wake_up_all_idle_cpus at /usr/src/linux-next/kernel/smp.c:1174
> [ 630.178040][ T129]
> [ 630.178040][){++++}-{0:0}, at help us debug this:
> [ 630.202292][ T129] Possible unsafe locking scenario:
> [ 630.202292][ T129]
> [ 630.209590][ T129] CPU0
> [ 630.212720][ T129] ----
> [ 630.215851][ T129] lock(cpu_hotplug_lock);
> [ 630.220202][ T129] lock(cpu_hotplug_lock);
> [ 630.224553][ T129]
> [ 630.224553][ T129] *** DEADLOCK ***
> [ 630.224553][ T129]
> [ 630.232545][ T129] May be due to missing lock nesting notation
> [ 630.232545][ T129]
> [ 630.240711][ T129] 3 locks held by cpuhp/21/129:
> [ 630.245406][ T129] #0: ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
> [ 630.254976][ T129] #1: ffff800011f46780 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
> [ 630.264372][ T129] #2: ffff8000191fb9c8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x24/0x38
> [ 630.274031][ T129]
> [ 630.274031][ T129] stack backtrace:
> [ 630.279767][ T129] CPU: 21 PID: 129 Comm: cpuhp/21 Not tainted 5.15.0-rc5-next-20211013+ #145
> [ 630.288371][ T129] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [ 630.296886][ T129] Call trace:
> [ 630.300017][ T129] dump_backtrace+0x0/0x3b8
> [ 630.304369][ T129] show_stack+0x20/0x30
> [ 630.308371][ T129] dump_stack_lvl+0x8c/0xb8
> [ 630.312722][ T129] dump_stack+0x1c/0x38
> [ 630.316723][ T129] validate_chain+0x1d84/0x1da0
> [ 630.321421][ T129] __lock_acquire+0xab0/0x2040
> [ 630.326033][ T129] lock_acquire+0x32c/0xb08
> [ 630.330390][ T129] cpus_read_lock+0x94/0x308
> [ 630.334827][ T129] wake_up_all_idle_cpus+0x40/0xe8
> [ 630.339784][ T129] cpuidle_uninstall_idle_handler+0x3c/0x50
> [ 630.345524][ T129] cpuidle_pause_and_lock+0x28/0x38
> [ 630.350569][ T129] acpi_processor_hotplug+0xc0/0x170
> [ 630.355701][ T129] acpi_soft_cpu_online+0x124/0x250
> [ 630.360745][ T129] cpuhp_invoke_callback+0x51c/0x2ab8
> [ 630.365963][ T129] cpuhp_thread_fun+0x204/0x588
> [ 630.370659][ T129] smpboot_thread_fn+0x3f0/0xc40
> [ 630.375444][ T129] kthread+0x3d8/0x488
> [ 630.379360][ T129] ret_from_fork+0x10/0x20
> [ 863.525716][ T191] INFO: task cpuhp/21:129 blocked for more than 122 seconds.
> [ 863.532954][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
> [ 863.539361][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 863.547927][ T191] task:cpuhp/21 state:D stack:59104 pid: 129 ppid: 2 flags:0x00000008
> [ 863.557029][ T191] Call trace:
> [ 863.560171][ T191] __switch_to+0x184/0x400
> [ 863.564448][ T191] __schedule+0x74c/0x1940
> [ 863.568753][ T191] schedule+0x110/0x318
> [ 863.572764][ T191] percpu_rwsem_wait+0x1b8/0x348
> [ 863.577592][ T191] __percpu_down_read+0xb0/0x148
> [ 863.582386][ T191] cpus_read_lock+0x2b0/0x308
> [ 863.586961][ T191] wake_up_all_idle_cpus+0x40/0xe8
> [ 863.591931][ T191] cpuidle_uninstall_idle_handler+0x3c/0x50
> [ 863.597716][ T191] cpuidle_pause_and_lock+0x28/0x38
> [ 863.602771][ T191] acpi_processor_hotplug+0xc0/0x170
> [ 863.607946][ T191] acpi_soft_cpu_online+0x124/0x250
> [ 863.613001][ T191] cpuhp_invoke_callback+0x51c/0x2ab8
> [ 863.618261][ T191] cpuhp_thread_fun+0x204/0x588
> [ 863.622967][ T191] smpboot_thread_fn+0x3f0/0xc40
> [ 863.627787][ T191] kthread+0x3d8/0x488
> [ 863.631712][ T191] ret_from_fork+0x10/0x20
> [ 863.636020][ T191] INFO: task kworker/0:2:189 blocked for more than 122 seconds.
> [ 863.643500][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
> [ 863.649882][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 863.658425][ T191] task:kworker/0:2 state:D stack:58368 pid: 189 ppid: 2 flags:0x00000008
> [ 863.667516][ T191] Workqueue: events vmstat_shepherd
> [ 863.672573][ T191] Call trace:
> [ 863.675731][ T191] __switch_to+0x184/0x400
> [ 863.680001][ T191] __schedule+0x74c/0x1940
> [ 863.684268][ T191] schedule+0x110/0x318
> [ 863.688295][ T191] percpu_rwsem_wait+0x1b8/0x348
> [ 863.693085][ T191] __percpu_down_read+0xb0/0x148
> [ 863.697892][ T191] cpus_read_lock+0x2b0/0x308
> [ 863.702421][ T191] vmstat_shepherd+0x5c/0x1a8
> [ 863.706977][ T191] process_one_work+0x808/0x19d0
> [ 863.711767][ T191] worker_thread+0x334/0xae0
> [ 863.716227][ T191] kthread+0x3d8/0x488
> [ 863.720149][ T191] ret_from_fork+0x10/0x20
> [ 863.724487][ T191] INFO: task lsbug:4642 blocked for more than 123 seconds.
> [ 863.731565][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
> [ 863.737938][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 863.746490][ T191] task:lsbug state:D stack:55536 pid: 4642 ppid: 4638 flags:0x00000008
> [ 863.755549][ T191] Call trace:
> [ 863.758712][ T191] __switch_to+0x184/0x400
> [ 863.762984][ T191] __schedule+0x74c/0x1940
> [ 863.767286][ T191] schedule+0x110/0x318
> [ 863.771294][ T191] schedule_timeout+0x188/0x238
> [ 863.776016][ T191] wait_for_completion+0x174/0x290
> [ 863.780979][ T191] __cpuhp_kick_ap+0x158/0x1a8
> [ 863.785592][ T191] cpuhp_kick_ap+0x1f0/0x828
> [ 863.790053][ T191] bringup_cpu+0x180/0x1e0
> [ 863.794320][ T191] cpuhp_invoke_callback+0x51c/0x2ab8
> [ 863.799561][ T191] cpuhp_invoke_callback_range+0xa4/0x108
> [ 863.805130][ T191] cpu_up+0x528/0xd78
> [ 863.808982][ T191] cpu_device_up+0x4c/0x68
> [ 863.813249][ T191] cpu_subsys_online+0xc0/0x1f8
> [ 863.817972][ T191] device_online+0x10c/0x180
> [ 863.822413][ T191] online_store+0x10c/0x118
> [ 863.826791][ T191] dev_attr_store+0x44/0x78
> [ 863.831148][ T191] sysfs_kf_write+0xe8/0x138
> [ 863.835590][ T191] kernfs_fop_write_iter+0x26c/0x3d0
> [ 863.840745][ T191] new_sync_write+0x2bc/0x4f8
> [ 863.845275][ T191] vfs_write+0x714/0xcd8
> [ 863.849387][ T191] ksys_write+0xf8/0x1e0
> [ 863.853481][ T191] __arm64_sys_write+0x74/0xa8
> [ 863.858113][ T191] invoke_syscall.constprop.0+0xdc/0x1d8
> [ 863.863597][ T191] do_el0_svc+0xe4/0x298
> [ 863.867710][ T191] el0_svc+0x64/0x130
> [ 863.871545][ T191] el0t_64_sync_handler+0xb0/0xb8
> [ 863.876437][ T191] el0t_64_sync+0x180/0x184
> [ 863.880798][ T191] INFO: task mount:4682 blocked for more than 123 seconds.
> [ 863.887881][ T191] Not tainted 5.15.0-rc5-next-20211013+ #145
> [ 863.894232][ T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 863.902776][ T191] task:mount state:D stack:55856 pid: 4682 ppid: 1101 flags:0x00000000
> [ 863.911865][ T191] Call trace:
> [ 863.915003][ T191] __switch_to+0x184/0x400
> [ 863.919296][ T191] __schedule+0x74c/0x1940
> [ 863.923564][ T191] schedule+0x110/0x318
> [ 863.927590][ T191] percpu_rwsem_wait+0x1b8/0x348
> [ 863.932380][ T191] __percpu_down_read+0xb0/0x148
> [ 863.937187][ T191] cpus_read_lock+0x2b0/0x308
> [ 863.941715][ T191] alloc_workqueue+0x730/0xd48
> [ 863.946357][ T191] loop_configure+0x2d4/0x1180 [loop]
> [ 863.951592][ T191] lo_ioctl+0x5dc/0x1228 [loop]
> [ 863.956321][ T191] blkdev_ioctl+0x258/0x820
> [ 863.960678][ T191] __arm64_sys_ioctl+0x114/0x180
> [ 863.965468][ T191] invoke_syscall.constprop.0+0xdc/0x1d8
> [ 863.970974][ T191] do_el0_svc+0xe4/0x298
> [ 863.975069][ T191] el0_svc+0x64/0x130
> [ 863.978922][ T191] el0t_64_sync_handler+0xb0/0xb8
> [ 863.983798][ T191] el0t_64_sync+0x180/0x184
> [ 863.988172][ T191] INFO: lockdep is turned off.
>

2021-10-19 09:00:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()

On Mon, Oct 18, 2021 at 11:47:32PM -0400, Qian Cai wrote:
> Peter, any thoughts? I did confirm that reverting the commit fixed the issue.
>
> On 10/13/2021 10:32 AM, Qian Cai wrote:
> >
> >
> > On 9/29/2021 11:17 AM, Peter Zijlstra wrote:
> >> --- a/kernel/smp.c
> >> +++ b/kernel/smp.c
> >> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
> >> {
> >> int cpu;
> >>
> >> - preempt_disable();
> >> + cpus_read_lock();
> >> for_each_online_cpu(cpu) {
> >> - if (cpu == smp_processor_id())
> >> + if (cpu == raw_smp_processor_id())
> >> continue;
> >>
> >> wake_up_if_idle(cpu);
> >> }
> >> - preempt_enable();
> >> + cpus_read_unlock();

Right, so yesterday I thought: YW2KGrvvv/[email protected]
but today I might have another idea, lemme go prod at this a bit more.

2021-10-19 09:17:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()

On Tue, Oct 19, 2021 at 10:56:48AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 18, 2021 at 11:47:32PM -0400, Qian Cai wrote:
> > Peter, any thoughts? I did confirm that reverting the commit fixed the issue.
> >
> > On 10/13/2021 10:32 AM, Qian Cai wrote:
> > >
> > >
> > > On 9/29/2021 11:17 AM, Peter Zijlstra wrote:
> > >> --- a/kernel/smp.c
> > >> +++ b/kernel/smp.c
> > >> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
> > >> {
> > >> int cpu;
> > >>
> > >> - preempt_disable();
> > >> + cpus_read_lock();
> > >> for_each_online_cpu(cpu) {
> > >> - if (cpu == smp_processor_id())
> > >> + if (cpu == raw_smp_processor_id())
> > >> continue;
> > >>
> > >> wake_up_if_idle(cpu);
> > >> }
> > >> - preempt_enable();
> > >> + cpus_read_unlock();
>
> Right, so yesterday I thought: YW2KGrvvv/[email protected]
> but today I might have another idea, lemme go prod at this a bit more.

Here, hows this then?

---
diff --git a/kernel/smp.c b/kernel/smp.c
index ad0b68a3a3d3..61ddc7a3bafa 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1170,14 +1170,12 @@ void wake_up_all_idle_cpus(void)
{
int cpu;

- cpus_read_lock();
- for_each_online_cpu(cpu) {
- if (cpu == raw_smp_processor_id())
- continue;
-
- wake_up_if_idle(cpu);
+ for_each_cpu(cpu) {
+ preempt_disable();
+ if (cpu != smp_processor_id() && cpu_online(cpu))
+ wake_up_if_idle(cpu);
+ preempt_enable();
}
- cpus_read_unlock();
}
EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);

2021-10-19 15:34:16

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()



On 10/19/2021 5:10 AM, Peter Zijlstra wrote:
> Here, hows this then?
>
> ---
> diff --git a/kernel/smp.c b/kernel/smp.c
> index ad0b68a3a3d3..61ddc7a3bafa 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -1170,14 +1170,12 @@ void wake_up_all_idle_cpus(void)
> {
> int cpu;
>
> - cpus_read_lock();
> - for_each_online_cpu(cpu) {
> - if (cpu == raw_smp_processor_id())
> - continue;
> -
> - wake_up_if_idle(cpu);
> + for_each_cpu(cpu) {
> + preempt_disable();
> + if (cpu != smp_processor_id() && cpu_online(cpu))
> + wake_up_if_idle(cpu);
> + preempt_enable();
> }
> - cpus_read_unlock();
> }
> EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);

This does not compile.

kernel/smp.c:1173:18: error: macro "for_each_cpu" requires 2 arguments, but only 1 given

2021-10-19 15:53:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()

On Tue, Oct 19, 2021 at 11:32:15AM -0400, Qian Cai wrote:
>
>
> On 10/19/2021 5:10 AM, Peter Zijlstra wrote:
> > Here, hows this then?
> >
> > ---
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index ad0b68a3a3d3..61ddc7a3bafa 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -1170,14 +1170,12 @@ void wake_up_all_idle_cpus(void)
> > {
> > int cpu;
> >
> > - cpus_read_lock();
> > - for_each_online_cpu(cpu) {
> > - if (cpu == raw_smp_processor_id())
> > - continue;
> > -
> > - wake_up_if_idle(cpu);
> > + for_each_cpu(cpu) {
> > + preempt_disable();
> > + if (cpu != smp_processor_id() && cpu_online(cpu))
> > + wake_up_if_idle(cpu);
> > + preempt_enable();
> > }
> > - cpus_read_unlock();
> > }
> > EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);
>
> This does not compile.
>
> kernel/smp.c:1173:18: error: macro "for_each_cpu" requires 2 arguments, but only 1 given

Bah, for_each_possible_cpu(), lemme update the patch, I'm sure the
robots will scream bloody murder on that too.

2021-10-19 19:28:19

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()



On 10/19/21 11:50 AM, Peter Zijlstra wrote:
>>> diff --git a/kernel/smp.c b/kernel/smp.c
>>> index ad0b68a3a3d3..61ddc7a3bafa 100644
>>> --- a/kernel/smp.c
>>> +++ b/kernel/smp.c
>>> @@ -1170,14 +1170,12 @@ void wake_up_all_idle_cpus(void)
>>> {
>>> int cpu;
>>>
>>> - cpus_read_lock();
>>> - for_each_online_cpu(cpu) {
>>> - if (cpu == raw_smp_processor_id())
>>> - continue;
>>> -
>>> - wake_up_if_idle(cpu);
>>> + for_each_cpu(cpu) {
>>> + preempt_disable();
>>> + if (cpu != smp_processor_id() && cpu_online(cpu))
>>> + wake_up_if_idle(cpu);
>>> + preempt_enable();
>>> }
>>> - cpus_read_unlock();
>>> }
>>> EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);
>>
>> This does not compile.
>>
>> kernel/smp.c:1173:18: error: macro "for_each_cpu" requires 2 arguments, but only 1 given
>
> Bah, for_each_possible_cpu(), lemme update the patch, I'm sure the
> robots will scream bloody murder on that too.

This survived the regression tests here.

2021-10-19 20:31:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()

On Tue, Oct 19, 2021 at 03:22:43PM -0400, Qian Cai wrote:
>
>
> On 10/19/21 11:50 AM, Peter Zijlstra wrote:
> >>> diff --git a/kernel/smp.c b/kernel/smp.c
> >>> index ad0b68a3a3d3..61ddc7a3bafa 100644
> >>> --- a/kernel/smp.c
> >>> +++ b/kernel/smp.c
> >>> @@ -1170,14 +1170,12 @@ void wake_up_all_idle_cpus(void)
> >>> {
> >>> int cpu;
> >>>
> >>> - cpus_read_lock();
> >>> - for_each_online_cpu(cpu) {
> >>> - if (cpu == raw_smp_processor_id())
> >>> - continue;
> >>> -
> >>> - wake_up_if_idle(cpu);
> >>> + for_each_cpu(cpu) {
> >>> + preempt_disable();
> >>> + if (cpu != smp_processor_id() && cpu_online(cpu))
> >>> + wake_up_if_idle(cpu);
> >>> + preempt_enable();
> >>> }
> >>> - cpus_read_unlock();
> >>> }
> >>> EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);
> >>
> >> This does not compile.
> >>
> >> kernel/smp.c:1173:18: error: macro "for_each_cpu" requires 2 arguments, but only 1 given
> >
> > Bah, for_each_possible_cpu(), lemme update the patch, I'm sure the
> > robots will scream bloody murder on that too.
>
> This survived the regression tests here.

Thanks!

2021-10-22 13:50:51

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()

Hi

On 29.09.2021 17:17, Peter Zijlstra wrote:
> Simplify and make wake_up_if_idle() more robust, also don't iterate
> the whole machine with preempt_disable() in it's caller:
> wake_up_all_idle_cpus().
>
> This prepares for another wake_up_if_idle() user that needs a full
> do_idle() cycle.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

This patch landed recently in linux-next as commit 8850cb663b5c ("sched:
Simplify wake_up_*idle*()"). It causes the following warning on the
arm64 virt machine under qemu during the system suspend/resume cycle:

--->8---

 printk: Suspending console(s) (use no_console_suspend to debug)

 ============================================
 WARNING: possible recursive locking detected
 5.15.0-rc6-next-20211022 #10905 Not tainted
 --------------------------------------------
 rtcwake/1326 is trying to acquire lock:
 ffffd4e9192e8130 (cpu_hotplug_lock){++++}-{0:0}, at:
wake_up_all_idle_cpus+0x24/0x98

 but task is already holding lock:
 ffffd4e9192e8130 (cpu_hotplug_lock){++++}-{0:0}, at:
suspend_devices_and_enter+0x740/0x9f0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(cpu_hotplug_lock);
   lock(cpu_hotplug_lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 5 locks held by rtcwake/1326:
  #0: ffff54ad86a78438 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0x64/0xf0
  #1: ffff54ad84094a88 (&of->mutex){+.+.}-{3:3}, at:
kernfs_fop_write_iter+0xf4/0x1a8
  #2: ffff54ad83b17a88 (kn->active#43){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0xfc/0x1a8
  #3: ffffd4e9192efab0 (system_transition_mutex){+.+.}-{3:3}, at:
pm_suspend+0x214/0x3d0
  #4: ffffd4e9192e8130 (cpu_hotplug_lock){++++}-{0:0}, at:
suspend_devices_and_enter+0x740/0x9f0

 stack backtrace:
 CPU: 0 PID: 1326 Comm: rtcwake Not tainted 5.15.0-rc6-next-20211022 #10905
 Hardware name: linux,dummy-virt (DT)
 Call trace:
  dump_backtrace+0x0/0x1d0
  show_stack+0x14/0x20
  dump_stack_lvl+0x88/0xb0
  dump_stack+0x14/0x2c
  __lock_acquire+0x171c/0x17b8
  lock_acquire+0x234/0x378
  cpus_read_lock+0x5c/0x150
  wake_up_all_idle_cpus+0x24/0x98
  suspend_devices_and_enter+0x748/0x9f0
  pm_suspend+0x2b0/0x3d0
  state_store+0x84/0x108
  kobj_attr_store+0x14/0x28
  sysfs_kf_write+0x60/0x70
  kernfs_fop_write_iter+0x124/0x1a8
  new_sync_write+0xe8/0x1b0
  vfs_write+0x1d0/0x408
  ksys_write+0x64/0xf0
  __arm64_sys_write+0x14/0x20
  invoke_syscall+0x40/0xf8
  el0_svc_common.constprop.3+0x8c/0x120
  do_el0_svc_compat+0x18/0x48
  el0_svc_compat+0x48/0x100
  el0t_32_sync_handler+0xec/0x140
  el0t_32_sync+0x170/0x174
 OOM killer enabled.
 Restarting tasks ... done.
 PM: suspend exit

--->8---

Let me know if there is anything I can help to debug and fix this issue.

> ---
> kernel/sched/core.c | 14 +++++---------
> kernel/smp.c | 6 +++---
> 2 files changed, 8 insertions(+), 12 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3691,15 +3691,11 @@ void wake_up_if_idle(int cpu)
> if (!is_idle_task(rcu_dereference(rq->curr)))
> goto out;
>
> - if (set_nr_if_polling(rq->idle)) {
> - trace_sched_wake_idle_without_ipi(cpu);
> - } else {
> - rq_lock_irqsave(rq, &rf);
> - if (is_idle_task(rq->curr))
> - smp_send_reschedule(cpu);
> - /* Else CPU is not idle, do nothing here: */
> - rq_unlock_irqrestore(rq, &rf);
> - }
> + rq_lock_irqsave(rq, &rf);
> + if (is_idle_task(rq->curr))
> + resched_curr(rq);
> + /* Else CPU is not idle, do nothing here: */
> + rq_unlock_irqrestore(rq, &rf);
>
> out:
> rcu_read_unlock();
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
> {
> int cpu;
>
> - preempt_disable();
> + cpus_read_lock();
> for_each_online_cpu(cpu) {
> - if (cpu == smp_processor_id())
> + if (cpu == raw_smp_processor_id())
> continue;
>
> wake_up_if_idle(cpu);
> }
> - preempt_enable();
> + cpus_read_unlock();
> }
> EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);
>
>
>
>
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland