2022-08-26 09:52:45

by Ye Weihua

[permalink] [raw]
Subject: [PATCH] signal: fix deadlock caused by calling printk() under sighand->siglock

__dend_signal_locked() invokes __sigqueue_alloc() which may invoke a
normal printk() to print failure message. This can cause a deadlock in
the scenario reported by syz-bot below (test in 5.10):

CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(&tty->read_wait);
lock(&sighand->siglock);
lock(console_owner);

This patch specities __GFP_NOWARN to __sigqueue_alloc(), so that printk
will not be called, and this deadlock problem can be avoided.

Syzbot reported the following lockdep error:

======================================================
WARNING: possible circular locking dependency detected
5.10.0-04424-ga472e3c833d3 #1 Not tainted
------------------------------------------------------
syz-executor.2/31970 is trying to acquire lock:
ffffa00014066a60 (console_owner){-.-.}-{0:0}, at: console_trylock_spinning+0xf0/0x2e0 kernel/printk/printk.c:1854

but task is already holding lock:
ffff0000ddb38a98 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x60/0x260 kernel/signal.c:1322

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&sighand->siglock){-.-.}-{2:2}:
validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
__lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xc0/0x15c kernel/locking/spinlock.c:159
__lock_task_sighand+0xf0/0x370 kernel/signal.c:1396
lock_task_sighand include/linux/sched/signal.h:699 [inline]
task_work_add+0x1f8/0x2a0 kernel/task_work.c:58
io_req_task_work_add+0x98/0x10c fs/io_uring.c:2115
__io_async_wake+0x338/0x780 fs/io_uring.c:4984
io_poll_wake+0x40/0x50 fs/io_uring.c:5461
__wake_up_common+0xcc/0x2a0 kernel/sched/wait.c:93
__wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123
__wake_up+0x1c/0x24 kernel/sched/wait.c:142
pty_set_termios+0x1ac/0x2d0 drivers/tty/pty.c:286
tty_set_termios+0x310/0x46c drivers/tty/tty_ioctl.c:334
set_termios.part.0+0x2dc/0xa50 drivers/tty/tty_ioctl.c:414
set_termios drivers/tty/tty_ioctl.c:368 [inline]
tty_mode_ioctl+0x4f4/0xbec drivers/tty/tty_ioctl.c:736
n_tty_ioctl_helper+0x74/0x260 drivers/tty/tty_ioctl.c:883
n_tty_ioctl+0x80/0x3d0 drivers/tty/n_tty.c:2516
tty_ioctl+0x508/0x1100 drivers/tty/tty_io.c:2751
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__arm64_sys_ioctl+0x12c/0x18c fs/ioctl.c:739
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common.constprop.0+0xf8/0x420 arch/arm64/kernel/syscall.c:155
do_el0_svc+0x50/0x120 arch/arm64/kernel/syscall.c:217
el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

-> #3 (&tty->read_wait){....}-{2:2}:
validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
__lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0xa0/0x120 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
io_poll_double_wake+0x158/0x30c fs/io_uring.c:5093
__wake_up_common+0xcc/0x2a0 kernel/sched/wait.c:93
__wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123
__wake_up+0x1c/0x24 kernel/sched/wait.c:142
pty_close+0x1bc/0x330 drivers/tty/pty.c:68
tty_release+0x1e0/0x88c drivers/tty/tty_io.c:1761
__fput+0x1dc/0x500 fs/file_table.c:281
____fput+0x24/0x30 fs/file_table.c:314
task_work_run+0xf4/0x1ec kernel/task_work.c:151
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
do_notify_resume+0x378/0x410 arch/arm64/kernel/signal.c:718
work_pending+0xc/0x198

-> #2 (&tty->write_wait){....}-{2:2}:
validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
__lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xc0/0x15c kernel/locking/spinlock.c:159
__wake_up_common_lock+0xb0/0x130 kernel/sched/wait.c:122
__wake_up+0x1c/0x24 kernel/sched/wait.c:142
tty_wakeup+0x54/0xbc drivers/tty/tty_io.c:539
tty_port_default_wakeup+0x38/0x50 drivers/tty/tty_port.c:50
tty_port_tty_wakeup+0x3c/0x50 drivers/tty/tty_port.c:388
uart_write_wakeup+0x38/0x60 drivers/tty/serial/serial_core.c:106
pl011_tx_chars+0x530/0x5c0 drivers/tty/serial/amba-pl011.c:1418
pl011_start_tx_pio drivers/tty/serial/amba-pl011.c:1303 [inline]
pl011_start_tx+0x1b4/0x430 drivers/tty/serial/amba-pl011.c:1315
__uart_start.isra.0+0xb4/0xcc drivers/tty/serial/serial_core.c:127
uart_write+0x21c/0x460 drivers/tty/serial/serial_core.c:613
process_output_block+0x120/0x3ac drivers/tty/n_tty.c:590
n_tty_write+0x2c8/0x650 drivers/tty/n_tty.c:2383
do_tty_write drivers/tty/tty_io.c:1028 [inline]
file_tty_write.constprop.0+0x2d0/0x520 drivers/tty/tty_io.c:1118
tty_write drivers/tty/tty_io.c:1125 [inline]
redirected_tty_write+0xe4/0x104 drivers/tty/tty_io.c:1147
call_write_iter include/linux/fs.h:1960 [inline]
new_sync_write+0x264/0x37c fs/read_write.c:515
vfs_write+0x694/0x9d0 fs/read_write.c:602
ksys_write+0xfc/0x200 fs/read_write.c:655
__do_sys_write fs/read_write.c:667 [inline]
__se_sys_write fs/read_write.c:664 [inline]
__arm64_sys_write+0x50/0x60 fs/read_write.c:664
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common.constprop.0+0xf8/0x420 arch/arm64/kernel/syscall.c:155
do_el0_svc+0x50/0x120 arch/arm64/kernel/syscall.c:217
el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

-> #1 (&port_lock_key){-.-.}-{2:2}:
validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
__lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0xa0/0x120 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
pl011_console_write+0x2f0/0x410 drivers/tty/serial/amba-pl011.c:2263
call_console_drivers.constprop.0+0x1f8/0x3b0 kernel/printk/printk.c:1932
console_unlock+0x36c/0x9ec kernel/printk/printk.c:2553
vprintk_emit+0x40c/0x4b0 kernel/printk/printk.c:2075
vprintk_default+0x48/0x54 kernel/printk/printk.c:2092
vprintk_func+0x1f0/0x40c kernel/printk/printk_safe.c:404
printk+0xbc/0xf0 kernel/printk/printk.c:2123
register_console+0x580/0x790 kernel/printk/printk.c:2905
uart_configure_port.constprop.0+0x4a0/0x4e0 drivers/tty/serial/serial_core.c:2431
uart_add_one_port+0x378/0x550 drivers/tty/serial/serial_core.c:2944
pl011_register_port+0xb4/0x210 drivers/tty/serial/amba-pl011.c:2686
pl011_probe+0x334/0x3ec drivers/tty/serial/amba-pl011.c:2736
amba_probe+0x14c/0x2f0 drivers/amba/bus.c:283
really_probe+0x210/0xa5c drivers/base/dd.c:562
driver_probe_device+0x1c8/0x280 drivers/base/dd.c:747
__device_attach_driver+0x18c/0x260 drivers/base/dd.c:853
bus_for_each_drv+0x120/0x1a0 drivers/base/bus.c:431
__device_attach+0x16c/0x3b4 drivers/base/dd.c:922
device_initial_probe+0x28/0x34 drivers/base/dd.c:971
bus_probe_device+0x124/0x13c drivers/base/bus.c:491
fw_devlink_resume+0x164/0x270 drivers/base/core.c:1601
of_platform_default_populate_init+0xf4/0x114 drivers/of/platform.c:543
do_one_initcall+0x11c/0x770 init/main.c:1217
do_initcall_level+0x364/0x388 init/main.c:1290
do_initcalls+0x90/0xc0 init/main.c:1306
do_basic_setup init/main.c:1326 [inline]
kernel_init_freeable+0x57c/0x63c init/main.c:1529
kernel_init+0x1c/0x20c init/main.c:1417
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1034

-> #0 (console_owner){-.-.}-{0:0}:
check_prev_add+0xe0/0x105c kernel/locking/lockdep.c:2988
check_prevs_add+0x1c8/0x3d4 kernel/locking/lockdep.c:3113
validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
__lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
console_trylock_spinning+0x130/0x2e0 kernel/printk/printk.c:1875
vprintk_emit+0x268/0x4b0 kernel/printk/printk.c:2074
vprintk_default+0x48/0x54 kernel/printk/printk.c:2092
vprintk_func+0x1f0/0x40c kernel/printk/printk_safe.c:404
printk+0xbc/0xf0 kernel/printk/printk.c:2123
fail_dump lib/fault-inject.c:45 [inline]
should_fail+0x2a0/0x370 lib/fault-inject.c:146
__should_failslab+0x8c/0xe0 mm/failslab.c:33
should_failslab+0x14/0x2c mm/slab_common.c:1181
slab_pre_alloc_hook mm/slab.h:495 [inline]
slab_alloc_node mm/slub.c:2842 [inline]
slab_alloc mm/slub.c:2931 [inline]
kmem_cache_alloc+0x8c/0xe64 mm/slub.c:2936
__sigqueue_alloc+0x224/0x5a4 kernel/signal.c:437
__send_signal+0x700/0xeac kernel/signal.c:1121
send_signal+0x348/0x6a0 kernel/signal.c:1247
force_sig_info_to_task+0x184/0x260 kernel/signal.c:1339
force_sig_fault_to_task kernel/signal.c:1678 [inline]
force_sig_fault+0xb0/0xf0 kernel/signal.c:1685
arm64_force_sig_fault arch/arm64/kernel/traps.c:182 [inline]
arm64_notify_die arch/arm64/kernel/traps.c:208 [inline]
arm64_notify_die+0xdc/0x160 arch/arm64/kernel/traps.c:199
do_sp_pc_abort+0x4c/0x60 arch/arm64/mm/fault.c:794
el0_pc+0xd8/0x19c arch/arm64/kernel/entry-common.c:309
el0_sync_handler+0x12c/0x1e0 arch/arm64/kernel/entry-common.c:394
el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

other info that might help us debug this:

Chain exists of:
console_owner --> &tty->read_wait --> &sighand->siglock

Signed-off-by: Ye Weihua <[email protected]>
---
kernel/signal.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 6f86fda5e432..3a3581739a6d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1120,7 +1120,8 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
else
override_rlimit = 0;

- q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit, 0);
+ q = __sigqueue_alloc(sig, t, GFP_ATOMIC | __GFP_NOWARN,
+ override_rlimit, 0);

if (q) {
list_add_tail(&q->list, &pending->list);
--
2.17.1


2022-08-26 22:00:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] signal: fix deadlock caused by calling printk() under sighand->siglock

Ye Weihua <[email protected]> writes:

> __dend_signal_locked() invokes __sigqueue_alloc() which may invoke a
> normal printk() to print failure message. This can cause a deadlock in
> the scenario reported by syz-bot below (test in 5.10):
>
> CPU0 CPU1
> ---- ----
> lock(&sighand->siglock);
> lock(&tty->read_wait);
> lock(&sighand->siglock);
> lock(console_owner);
>
> This patch specities __GFP_NOWARN to __sigqueue_alloc(), so that printk
> will not be called, and this deadlock problem can be avoided.

While the patch below will in theory fix the reported deadlock, I don't
think it is a good choice of fix. As a rule we want to allow printk to
be callable in as many places as possible, so that it can be used for
debugging. There are enough places that take siglock that outlawing
printk under siglock will make the kernel unstable.

I tried to read the current kernel and verify this deadlock to see if I
could suggest a better location to change the code to fix the deadlock.
Say modifying task_work_add to not take siglock. The current
task_work_add does not take siglock. I encountered a few other
significant function differences as well. One significant difference is
that io_poll_double_wake no longer exists.

I think the amb-pl011.c driver might even be more different yet.

Can you reproduce this on current kernels?

Reading the code I think this is already fixed.

Perhaps you want to read the code of the affected subsystems and pick
some appropriate changes to backport to 5.10?

Eric

2022-09-05 06:47:02

by Ye Weihua

[permalink] [raw]
Subject: Re: [PATCH] signal: fix deadlock caused by calling printk() under sighand->siglock


On 2022/8/27 5:23, Eric W. Biederman wrote:
> While the patch below will in theory fix the reported deadlock, I don't
> think it is a good choice of fix. As a rule we want to allow printk to
> be callable in as many places as possible, so that it can be used for
> debugging. There are enough places that take siglock that outlawing
> printk under siglock will make the kernel unstable.
>
> I tried to read the current kernel and verify this deadlock to see if I
> could suggest a better location to change the code to fix the deadlock.
> Say modifying task_work_add to not take siglock. The current
> task_work_add does not take siglock. I encountered a few other
> significant function differences as well. One significant difference is
> that io_poll_double_wake no longer exists.
>
> I think the amb-pl011.c driver might even be more different yet.
>
> Can you reproduce this on current kernels?
>
> Reading the code I think this is already fixed.
>
> Perhaps you want to read the code of the affected subsystems and pick
> some appropriate changes to backport to 5.10?
>
> Eric
>
> .

Thank you for your reply.


I tested it on the current version for a few days using the same case
and the

problem didn't occur again. You're right, the problem has been fixed.
I've read

some of the code and learned about some of the changes, and I'll try to
backport

the relevant patches to the problem version later.