We are trying an S3 suspend stress test and occasionally while
entering S3 we get a console deadlock. Is this a known issue? I'm not
really sure why a probe would be happening while suspending.
The kernel command line is `console=ttyS0` and kernel 5.4.39.
Here is the log:
[ 278.885831] leds mmc0::: calling led_resume+0x0/0x1c @ 3909,
parent: 0000:02:00.0
[ 278.885835] leds mmc0::: led_resume+0x0/0x1c returned 0 after 0 usecs
[ 278.890230] secondary_startup_64+0xa4/0xb0
[ 278.890236] handlers:
[ 279.752138] [<00000000b8e5d375>] i8042_interrupt
[ 279.756915] Disabling IRQ #1
[ 279.759801]
[ 279.759802] ======================================================
[ 279.759803] WARNING: possible circular locking dependency detected
[ 279.759803] 5.4.39 #55 Not tainted
[ 279.759804] ------------------------------------------------------
[ 279.759804] swapper/0/0 is trying to acquire lock:
[ 279.759805] ffffffffab65b6c0 (console_owner){-...}, at:
console_lock_spinning_enable+0x31/0x57
[ 279.759806]
[ 279.759807] but task is already holding lock:
[ 279.759807] ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at:
__report_bad_irq+0x5b/0xba
[ 279.759809]
[ 279.759809] which lock already depends on the new lock.
[ 279.759810]
[ 279.759810]
[ 279.759811] the existing dependency chain (in reverse order) is:
[ 279.759811]
[ 279.759811] -> #2 (&irq_desc_lock_class){-.-.}:
[ 279.759813] _raw_spin_lock_irqsave+0x61/0x8d
[ 279.759813] __irq_get_desc_lock+0x65/0x89
[ 279.759814] __disable_irq_nosync+0x3b/0x93
[ 279.759814] serial8250_do_startup+0x451/0x75c
[ 279.759815] uart_startup+0x1b4/0x2ff
[ 279.759815] uart_port_activate+0x73/0xa0
[ 279.759815] tty_port_open+0xae/0x10a
[ 279.759816] uart_open+0x1b/0x26
[ 279.759816] tty_open+0x24d/0x3a0
[ 279.759817] chrdev_open+0xd5/0x1cc
[ 279.759817] do_dentry_open+0x299/0x3c8
[ 279.759817] path_openat+0x434/0x1100
[ 279.759818] do_filp_open+0x9b/0x10a
[ 279.759818] do_sys_open+0x15f/0x3d7
[ 279.759819] kernel_init_freeable+0x157/0x1dd
[ 279.759819] kernel_init+0xe/0x105
[ 279.759819] ret_from_fork+0x27/0x50
[ 279.759820]
[ 279.759820] -> #1 (&port_lock_key){-.-.}:
[ 279.759822] _raw_spin_lock_irqsave+0x61/0x8d
[ 279.759822] serial8250_console_write+0xa7/0x2a0
[ 279.759823] console_unlock+0x3b7/0x528
[ 279.759823] vprintk_emit+0x111/0x17f
[ 279.759823] printk+0x59/0x73
[ 279.759824] register_console+0x336/0x3a4
[ 279.759824] uart_add_one_port+0x51b/0x5be
[ 279.759825] serial8250_register_8250_port+0x454/0x55e
[ 279.759825] dw8250_probe+0x4dc/0x5b9
[ 279.759825] platform_drv_probe+0x67/0x8b
[ 279.759826] really_probe+0x14a/0x422
[ 279.759826] driver_probe_device+0x66/0x130
[ 279.759827] device_driver_attach+0x42/0x5b
[ 279.759827] __driver_attach+0xca/0x139
[ 279.759827] bus_for_each_dev+0x97/0xc9
[ 279.759828] bus_add_driver+0x12b/0x228
[ 279.759828] driver_register+0x64/0xed
[ 279.759829] do_one_initcall+0x20c/0x4a6
[ 279.759829] do_initcall_level+0xb5/0xc5
[ 279.759829] do_basic_setup+0x4c/0x58
[ 279.759830] kernel_init_freeable+0x13f/0x1dd
[ 279.759830] kernel_init+0xe/0x105
[ 279.759831] ret_from_fork+0x27/0x50
[ 279.759831]
[ 279.759831] -> #0 (console_owner){-...}:
[ 279.759833] __lock_acquire+0x118d/0x2714
[ 279.759833] lock_acquire+0x203/0x258
[ 279.759834] console_lock_spinning_enable+0x51/0x57
[ 279.759834] console_unlock+0x25d/0x528
[ 279.759834] vprintk_emit+0x111/0x17f
[ 279.759835] printk+0x59/0x73
[ 279.759835] __report_bad_irq+0xa3/0xba
[ 279.759836] note_interrupt+0x19a/0x1d6
[ 279.759836] handle_irq_event_percpu+0x57/0x79
[ 279.759836] handle_irq_event+0x36/0x55
[ 279.759837] handle_fasteoi_irq+0xc2/0x18a
[ 279.759837] do_IRQ+0xb3/0x157
[ 279.759838] ret_from_intr+0x0/0x1d
[ 279.759838] cpuidle_enter_state+0x12f/0x1fd
[ 279.759838] cpuidle_enter+0x2e/0x3d
[ 279.759839] do_idle+0x1ce/0x2ce
[ 279.759839] cpu_startup_entry+0x1d/0x1f
[ 279.759840] start_kernel+0x406/0x46a
[ 279.759840] secondary_startup_64+0xa4/0xb0
[ 279.759840]
[ 279.759841] other info that might help us debug this:
[ 279.759841]
[ 279.759841] Chain exists of:
[ 279.759842] console_owner --> &port_lock_key --> &irq_desc_lock_class
[ 279.759844]
[ 279.759844] Possible unsafe locking scenario:
[ 279.759845]
[ 279.759845] CPU0 CPU1
[ 279.759845] ---- ----
[ 279.759846] lock(&irq_desc_lock_class);
[ 279.759847] lock(&port_lock_key);
[ 279.759848] lock(&irq_desc_lock_class);
[ 279.759849] lock(console_owner);
[ 279.759850]
[ 279.759850] *** DEADLOCK ***
[ 279.759850]
[ 279.759851] 2 locks held by swapper/0/0:
[ 279.759851] #0: ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at:
__report_bad_irq+0x5b/0xba
[ 279.759853] #1: ffffffffab65b5c0 (console_lock){+.+.}, at:
console_trylock_spinning+0x20/0x181
[ 279.759855]
[ 279.759855] stack backtrace:
[ 279.759856] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.39 #55
[ 279.759856] Hardware name: XXXXXX
[ 279.759857] Call Trace:
[ 279.759857] <IRQ>
[ 279.759857] dump_stack+0xbf/0x133
[ 279.759858] ? print_circular_bug+0xd6/0xe9
[ 279.759858] check_noncircular+0x1b9/0x1c3
[ 279.759858] __lock_acquire+0x118d/0x2714
[ 279.759859] lock_acquire+0x203/0x258
[ 279.759859] ? console_lock_spinning_enable+0x31/0x57
[ 279.759860] console_lock_spinning_enable+0x51/0x57
[ 279.759860] ? console_lock_spinning_enable+0x31/0x57
[ 279.759861] console_unlock+0x25d/0x528
[ 279.759861] ? console_trylock+0x18/0x4e
[ 279.759861] vprintk_emit+0x111/0x17f
[ 279.759862] ? lock_acquire+0x203/0x258
[ 279.759862] printk+0x59/0x73
[ 279.759862] __report_bad_irq+0xa3/0xba
[ 279.759863] note_interrupt+0x19a/0x1d6
[ 279.759863] handle_irq_event_percpu+0x57/0x79
[ 279.759864] handle_irq_event+0x36/0x55
[ 279.759864] handle_fasteoi_irq+0xc2/0x18a
[ 279.759864] do_IRQ+0xb3/0x157
[ 279.759865] common_interrupt+0xf/0xf
[ 279.759865] </IRQ>
[ 279.759866] RIP: 0010:cpuidle_enter_state+0x12f/0x1fd
[ 279.759867] Code: aa aa aa aa aa aa aa 48 89 45 c8 9c 8f 45 c8 0f
ba 65 c8 09 0f 82 cc 00 00 00 31 ff e8 57 25 76 ff e8 36 bb 7b ff fb
45 85 ff <78> 47 44 89 f8 48 6b d0 68 49 8b 4c 16 48 4c 2b 65 b8 4c 89
63 18
[ 279.759867] RSP: 0018:ffffffffab607e20 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffffda
[ 279.759869] RAX: 86733df5ce911a00 RBX: ffff888104709000 RCX: ffffffffa9bb229b
[ 279.759870] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffffffab62d780
[ 279.759870] RBP: ffffffffab607e68 R08: dffffc0000000000 R09: fffffbfff56eb96f
[ 279.759871] R10: fffffbfff56eb96f R11: ffffffffa9f4f800 R12: 00000040eb0f7ad6
[ 279.759871] R13: 0000000000000000 R14: ffffffffab6caf68 R15: 0000000000000001
[ 279.759871] ? flatten_lpi_states+0x190/0x1d7
[ 279.759872] ? trace_irq_enable_rcuidle+0x22/0xee
[ 279.759872] cpuidle_enter+0x2e/0x3d
[ 279.759872] do_idle+0x1ce/0x2ce
[ 279.759873] cpu_startup_entry+0x1d/0x1f
[ 279.759873] start_kernel+0x406/0x46a
[ 279.759873] secondary_startup_64+0xa4/0xb0
[ 279.927956] i2c_designware AMD0010:00: controller timed out
[ 280.388120] elan_i2c i2c-ELAN0000:00: reading cmd (0x0307) fail.
[ 280.394413] elan_i2c i2c-ELAN0000:00: failed to read current power
state: -110
[ 280.542626] elan_i2c i2c-ELAN0000:00: elan_resume+0x0/0x5f returned
0 after 1607813 usecs
Thanks,
Raul
+Cc: Tony and console people
On Mon, Jun 22, 2020 at 8:32 PM Raul Rangel <[email protected]> wrote:
>
> We are trying an S3 suspend stress test and occasionally while
> entering S3 we get a console deadlock. Is this a known issue? I'm not
> really sure why a probe would be happening while suspending.
>
> The kernel command line is `console=ttyS0` and kernel 5.4.39.
>
> Here is the log:
>
> [ 278.885831] leds mmc0::: calling led_resume+0x0/0x1c @ 3909,
> parent: 0000:02:00.0
> [ 278.885835] leds mmc0::: led_resume+0x0/0x1c returned 0 after 0 usecs
> [ 278.890230] secondary_startup_64+0xa4/0xb0
> [ 278.890236] handlers:
> [ 279.752138] [<00000000b8e5d375>] i8042_interrupt
> [ 279.756915] Disabling IRQ #1
> [ 279.759801]
> [ 279.759802] ======================================================
> [ 279.759803] WARNING: possible circular locking dependency detected
> [ 279.759803] 5.4.39 #55 Not tainted
> [ 279.759804] ------------------------------------------------------
> [ 279.759804] swapper/0/0 is trying to acquire lock:
> [ 279.759805] ffffffffab65b6c0 (console_owner){-...}, at:
> console_lock_spinning_enable+0x31/0x57
> [ 279.759806]
> [ 279.759807] but task is already holding lock:
> [ 279.759807] ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at:
> __report_bad_irq+0x5b/0xba
> [ 279.759809]
> [ 279.759809] which lock already depends on the new lock.
> [ 279.759810]
> [ 279.759810]
> [ 279.759811] the existing dependency chain (in reverse order) is:
> [ 279.759811]
> [ 279.759811] -> #2 (&irq_desc_lock_class){-.-.}:
> [ 279.759813] _raw_spin_lock_irqsave+0x61/0x8d
> [ 279.759813] __irq_get_desc_lock+0x65/0x89
> [ 279.759814] __disable_irq_nosync+0x3b/0x93
> [ 279.759814] serial8250_do_startup+0x451/0x75c
> [ 279.759815] uart_startup+0x1b4/0x2ff
> [ 279.759815] uart_port_activate+0x73/0xa0
> [ 279.759815] tty_port_open+0xae/0x10a
> [ 279.759816] uart_open+0x1b/0x26
> [ 279.759816] tty_open+0x24d/0x3a0
> [ 279.759817] chrdev_open+0xd5/0x1cc
> [ 279.759817] do_dentry_open+0x299/0x3c8
> [ 279.759817] path_openat+0x434/0x1100
> [ 279.759818] do_filp_open+0x9b/0x10a
> [ 279.759818] do_sys_open+0x15f/0x3d7
> [ 279.759819] kernel_init_freeable+0x157/0x1dd
> [ 279.759819] kernel_init+0xe/0x105
> [ 279.759819] ret_from_fork+0x27/0x50
> [ 279.759820]
> [ 279.759820] -> #1 (&port_lock_key){-.-.}:
> [ 279.759822] _raw_spin_lock_irqsave+0x61/0x8d
> [ 279.759822] serial8250_console_write+0xa7/0x2a0
> [ 279.759823] console_unlock+0x3b7/0x528
> [ 279.759823] vprintk_emit+0x111/0x17f
> [ 279.759823] printk+0x59/0x73
> [ 279.759824] register_console+0x336/0x3a4
> [ 279.759824] uart_add_one_port+0x51b/0x5be
> [ 279.759825] serial8250_register_8250_port+0x454/0x55e
> [ 279.759825] dw8250_probe+0x4dc/0x5b9
> [ 279.759825] platform_drv_probe+0x67/0x8b
> [ 279.759826] really_probe+0x14a/0x422
> [ 279.759826] driver_probe_device+0x66/0x130
> [ 279.759827] device_driver_attach+0x42/0x5b
> [ 279.759827] __driver_attach+0xca/0x139
> [ 279.759827] bus_for_each_dev+0x97/0xc9
> [ 279.759828] bus_add_driver+0x12b/0x228
> [ 279.759828] driver_register+0x64/0xed
> [ 279.759829] do_one_initcall+0x20c/0x4a6
> [ 279.759829] do_initcall_level+0xb5/0xc5
> [ 279.759829] do_basic_setup+0x4c/0x58
> [ 279.759830] kernel_init_freeable+0x13f/0x1dd
> [ 279.759830] kernel_init+0xe/0x105
> [ 279.759831] ret_from_fork+0x27/0x50
> [ 279.759831]
> [ 279.759831] -> #0 (console_owner){-...}:
> [ 279.759833] __lock_acquire+0x118d/0x2714
> [ 279.759833] lock_acquire+0x203/0x258
> [ 279.759834] console_lock_spinning_enable+0x51/0x57
> [ 279.759834] console_unlock+0x25d/0x528
> [ 279.759834] vprintk_emit+0x111/0x17f
> [ 279.759835] printk+0x59/0x73
> [ 279.759835] __report_bad_irq+0xa3/0xba
> [ 279.759836] note_interrupt+0x19a/0x1d6
> [ 279.759836] handle_irq_event_percpu+0x57/0x79
> [ 279.759836] handle_irq_event+0x36/0x55
> [ 279.759837] handle_fasteoi_irq+0xc2/0x18a
> [ 279.759837] do_IRQ+0xb3/0x157
> [ 279.759838] ret_from_intr+0x0/0x1d
> [ 279.759838] cpuidle_enter_state+0x12f/0x1fd
> [ 279.759838] cpuidle_enter+0x2e/0x3d
> [ 279.759839] do_idle+0x1ce/0x2ce
> [ 279.759839] cpu_startup_entry+0x1d/0x1f
> [ 279.759840] start_kernel+0x406/0x46a
> [ 279.759840] secondary_startup_64+0xa4/0xb0
> [ 279.759840]
> [ 279.759841] other info that might help us debug this:
> [ 279.759841]
> [ 279.759841] Chain exists of:
> [ 279.759842] console_owner --> &port_lock_key --> &irq_desc_lock_class
> [ 279.759844]
> [ 279.759844] Possible unsafe locking scenario:
> [ 279.759845]
> [ 279.759845] CPU0 CPU1
> [ 279.759845] ---- ----
> [ 279.759846] lock(&irq_desc_lock_class);
> [ 279.759847] lock(&port_lock_key);
> [ 279.759848] lock(&irq_desc_lock_class);
> [ 279.759849] lock(console_owner);
> [ 279.759850]
> [ 279.759850] *** DEADLOCK ***
> [ 279.759850]
> [ 279.759851] 2 locks held by swapper/0/0:
> [ 279.759851] #0: ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at:
> __report_bad_irq+0x5b/0xba
> [ 279.759853] #1: ffffffffab65b5c0 (console_lock){+.+.}, at:
> console_trylock_spinning+0x20/0x181
> [ 279.759855]
> [ 279.759855] stack backtrace:
> [ 279.759856] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.39 #55
> [ 279.759856] Hardware name: XXXXXX
> [ 279.759857] Call Trace:
> [ 279.759857] <IRQ>
> [ 279.759857] dump_stack+0xbf/0x133
> [ 279.759858] ? print_circular_bug+0xd6/0xe9
> [ 279.759858] check_noncircular+0x1b9/0x1c3
> [ 279.759858] __lock_acquire+0x118d/0x2714
> [ 279.759859] lock_acquire+0x203/0x258
> [ 279.759859] ? console_lock_spinning_enable+0x31/0x57
> [ 279.759860] console_lock_spinning_enable+0x51/0x57
> [ 279.759860] ? console_lock_spinning_enable+0x31/0x57
> [ 279.759861] console_unlock+0x25d/0x528
> [ 279.759861] ? console_trylock+0x18/0x4e
> [ 279.759861] vprintk_emit+0x111/0x17f
> [ 279.759862] ? lock_acquire+0x203/0x258
> [ 279.759862] printk+0x59/0x73
> [ 279.759862] __report_bad_irq+0xa3/0xba
> [ 279.759863] note_interrupt+0x19a/0x1d6
> [ 279.759863] handle_irq_event_percpu+0x57/0x79
> [ 279.759864] handle_irq_event+0x36/0x55
> [ 279.759864] handle_fasteoi_irq+0xc2/0x18a
> [ 279.759864] do_IRQ+0xb3/0x157
> [ 279.759865] common_interrupt+0xf/0xf
> [ 279.759865] </IRQ>
> [ 279.759866] RIP: 0010:cpuidle_enter_state+0x12f/0x1fd
> [ 279.759867] Code: aa aa aa aa aa aa aa 48 89 45 c8 9c 8f 45 c8 0f
> ba 65 c8 09 0f 82 cc 00 00 00 31 ff e8 57 25 76 ff e8 36 bb 7b ff fb
> 45 85 ff <78> 47 44 89 f8 48 6b d0 68 49 8b 4c 16 48 4c 2b 65 b8 4c 89
> 63 18
> [ 279.759867] RSP: 0018:ffffffffab607e20 EFLAGS: 00000202 ORIG_RAX:
> ffffffffffffffda
> [ 279.759869] RAX: 86733df5ce911a00 RBX: ffff888104709000 RCX: ffffffffa9bb229b
> [ 279.759870] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffffffab62d780
> [ 279.759870] RBP: ffffffffab607e68 R08: dffffc0000000000 R09: fffffbfff56eb96f
> [ 279.759871] R10: fffffbfff56eb96f R11: ffffffffa9f4f800 R12: 00000040eb0f7ad6
> [ 279.759871] R13: 0000000000000000 R14: ffffffffab6caf68 R15: 0000000000000001
> [ 279.759871] ? flatten_lpi_states+0x190/0x1d7
> [ 279.759872] ? trace_irq_enable_rcuidle+0x22/0xee
> [ 279.759872] cpuidle_enter+0x2e/0x3d
> [ 279.759872] do_idle+0x1ce/0x2ce
> [ 279.759873] cpu_startup_entry+0x1d/0x1f
> [ 279.759873] start_kernel+0x406/0x46a
> [ 279.759873] secondary_startup_64+0xa4/0xb0
> [ 279.927956] i2c_designware AMD0010:00: controller timed out
> [ 280.388120] elan_i2c i2c-ELAN0000:00: reading cmd (0x0307) fail.
> [ 280.394413] elan_i2c i2c-ELAN0000:00: failed to read current power
> state: -110
> [ 280.542626] elan_i2c i2c-ELAN0000:00: elan_resume+0x0/0x5f returned
> 0 after 1607813 usecs
>
>
> Thanks,
> Raul
--
With Best Regards,
Andy Shevchenko
Cc-ing more people
https://lore.kernel.org/lkml/CAHp75Vd8nTzmZdnhpTDChdc11zyCaSfeigbxaCpOWZ1Lv9ZBMw@mail.gmail.com
On (20/06/22 20:37), Andy Shevchenko wrote:
> > [ 279.759811] -> #2 (&irq_desc_lock_class){-.-.}:
> > [ 279.759813] _raw_spin_lock_irqsave+0x61/0x8d
> > [ 279.759813] __irq_get_desc_lock+0x65/0x89
> > [ 279.759814] __disable_irq_nosync+0x3b/0x93
> > [ 279.759814] serial8250_do_startup+0x451/0x75c
> > [ 279.759815] uart_startup+0x1b4/0x2ff
> > [ 279.759815] uart_port_activate+0x73/0xa0
> > [ 279.759815] tty_port_open+0xae/0x10a
> > [ 279.759816] uart_open+0x1b/0x26
> > [ 279.759816] tty_open+0x24d/0x3a0
> > [ 279.759817] chrdev_open+0xd5/0x1cc
> > [ 279.759817] do_dentry_open+0x299/0x3c8
> > [ 279.759817] path_openat+0x434/0x1100
> > [ 279.759818] do_filp_open+0x9b/0x10a
> > [ 279.759818] do_sys_open+0x15f/0x3d7
> > [ 279.759819] kernel_init_freeable+0x157/0x1dd
> > [ 279.759819] kernel_init+0xe/0x105
> > [ 279.759819] ret_from_fork+0x27/0x50
> > [ 279.759820]
> > [ 279.759820] -> #1 (&port_lock_key){-.-.}:
> > [ 279.759822] _raw_spin_lock_irqsave+0x61/0x8d
> > [ 279.759822] serial8250_console_write+0xa7/0x2a0
> > [ 279.759823] console_unlock+0x3b7/0x528
> > [ 279.759823] vprintk_emit+0x111/0x17f
> > [ 279.759823] printk+0x59/0x73
> > [ 279.759824] register_console+0x336/0x3a4
> > [ 279.759824] uart_add_one_port+0x51b/0x5be
> > [ 279.759825] serial8250_register_8250_port+0x454/0x55e
> > [ 279.759825] dw8250_probe+0x4dc/0x5b9
> > [ 279.759825] platform_drv_probe+0x67/0x8b
> > [ 279.759826] really_probe+0x14a/0x422
> > [ 279.759826] driver_probe_device+0x66/0x130
> > [ 279.759827] device_driver_attach+0x42/0x5b
> > [ 279.759827] __driver_attach+0xca/0x139
> > [ 279.759827] bus_for_each_dev+0x97/0xc9
> > [ 279.759828] bus_add_driver+0x12b/0x228
> > [ 279.759828] driver_register+0x64/0xed
> > [ 279.759829] do_one_initcall+0x20c/0x4a6
> > [ 279.759829] do_initcall_level+0xb5/0xc5
> > [ 279.759829] do_basic_setup+0x4c/0x58
> > [ 279.759830] kernel_init_freeable+0x13f/0x1dd
> > [ 279.759830] kernel_init+0xe/0x105
> > [ 279.759831] ret_from_fork+0x27/0x50
> > [ 279.759831]
> > [ 279.759831] -> #0 (console_owner){-...}:
> > [ 279.759833] __lock_acquire+0x118d/0x2714
> > [ 279.759833] lock_acquire+0x203/0x258
> > [ 279.759834] console_lock_spinning_enable+0x51/0x57
> > [ 279.759834] console_unlock+0x25d/0x528
> > [ 279.759834] vprintk_emit+0x111/0x17f
> > [ 279.759835] printk+0x59/0x73
> > [ 279.759835] __report_bad_irq+0xa3/0xba
> > [ 279.759836] note_interrupt+0x19a/0x1d6
> > [ 279.759836] handle_irq_event_percpu+0x57/0x79
> > [ 279.759836] handle_irq_event+0x36/0x55
> > [ 279.759837] handle_fasteoi_irq+0xc2/0x18a
> > [ 279.759837] do_IRQ+0xb3/0x157
> > [ 279.759838] ret_from_intr+0x0/0x1d
> > [ 279.759838] cpuidle_enter_state+0x12f/0x1fd
> > [ 279.759838] cpuidle_enter+0x2e/0x3d
> > [ 279.759839] do_idle+0x1ce/0x2ce
> > [ 279.759839] cpu_startup_entry+0x1d/0x1f
> > [ 279.759840] start_kernel+0x406/0x46a
> > [ 279.759840] secondary_startup_64+0xa4/0xb0
Hmm. So this is uart->port => desc->lock vs desc->lock => uart->port
chain #1:
serial8250_do_startup()
spin_lock_irqsave(&port->lock);
disable_irq_nosync(port->irq);
raw_spin_lock_irqsave(&desc->lock)
chain #2:
__report_bad_irq()
raw_spin_lock_irqsave(&desc->lock)
for_each_action_of_desc()
printk()
spin_lock_irqsave(&port->lock);
Breaking up chain #2 is not an option, I suppose. Those are a rather
important KERN_ERR messages, printk_deferred() will upset people badly.
So... Do we need to hold uart->port when we disable port->irq? What do we
race with? Module removal? The function bumps device PM counter (albeit
for UART_CAP_RPM ports only). But, at the same time, we do a whole bunch
of unprotected port->FOO accesses in serial8250_do_startup(). We even set
the IRQF_SHARED up->port.irqflags without grabbing the port->lock:
up->port.irqflags |= IRQF_SHARED;
spin_lock_irqsave(&port->lock, flags);
if (up->port.irqflags & IRQF_SHARED)
disable_irq_nosync(port->irq);
IOW, can we do something like this?
---
drivers/tty/serial/8250/8250_port.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index d64ca77d9cfa..ad30991e1b3b 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2275,6 +2275,11 @@ int serial8250_do_startup(struct uart_port *port)
if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
unsigned char iir1;
+ bool irq_shared = up->port.irqflags & IRQF_SHARED;
+
+ if (irq_shared)
+ disable_irq_nosync(port->irq);
+
/*
* Test for UARTs that do not reassert THRE when the
* transmitter is idle and the interrupt has already
@@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
* allow register changes to become visible.
*/
spin_lock_irqsave(&port->lock, flags);
- if (up->port.irqflags & IRQF_SHARED)
- disable_irq_nosync(port->irq);
wait_for_xmitr(up, UART_LSR_THRE);
serial_port_out_sync(port, UART_IER, UART_IER_THRI);
@@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
iir = serial_port_in(port, UART_IIR);
serial_port_out(port, UART_IER, 0);
- if (port->irqflags & IRQF_SHARED)
- enable_irq(port->irq);
spin_unlock_irqrestore(&port->lock, flags);
+ if (irq_shared)
+ enable_irq(port->irq);
/*
* If the interrupt is not reasserted, or we otherwise
--
2.27.0
On Tue 2020-06-30 12:58:16, Sergey Senozhatsky wrote:
> Cc-ing more people
>
> https://lore.kernel.org/lkml/CAHp75Vd8nTzmZdnhpTDChdc11zyCaSfeigbxaCpOWZ1Lv9ZBMw@mail.gmail.com
>
> On (20/06/22 20:37), Andy Shevchenko wrote:
> > > [ 279.759811] -> #2 (&irq_desc_lock_class){-.-.}:
> > > [ 279.759813] _raw_spin_lock_irqsave+0x61/0x8d
> > > [ 279.759813] __irq_get_desc_lock+0x65/0x89
> > > [ 279.759814] __disable_irq_nosync+0x3b/0x93
> > > [ 279.759814] serial8250_do_startup+0x451/0x75c
> > > [ 279.759815] uart_startup+0x1b4/0x2ff
> > > [ 279.759815] uart_port_activate+0x73/0xa0
> > > [ 279.759815] tty_port_open+0xae/0x10a
> > > [ 279.759816] uart_open+0x1b/0x26
> > > [ 279.759816] tty_open+0x24d/0x3a0
> > > [ 279.759817] chrdev_open+0xd5/0x1cc
> > > [ 279.759817] do_dentry_open+0x299/0x3c8
> > > [ 279.759817] path_openat+0x434/0x1100
> > > [ 279.759818] do_filp_open+0x9b/0x10a
> > > [ 279.759818] do_sys_open+0x15f/0x3d7
> > > [ 279.759819] kernel_init_freeable+0x157/0x1dd
> > > [ 279.759819] kernel_init+0xe/0x105
> > > [ 279.759819] ret_from_fork+0x27/0x50
> > > [ 279.759820]
> > > [ 279.759820] -> #1 (&port_lock_key){-.-.}:
> > > [ 279.759822] _raw_spin_lock_irqsave+0x61/0x8d
> > > [ 279.759822] serial8250_console_write+0xa7/0x2a0
> > > [ 279.759823] console_unlock+0x3b7/0x528
> > > [ 279.759823] vprintk_emit+0x111/0x17f
> > > [ 279.759823] printk+0x59/0x73
> > > [ 279.759824] register_console+0x336/0x3a4
> > > [ 279.759824] uart_add_one_port+0x51b/0x5be
> > > [ 279.759825] serial8250_register_8250_port+0x454/0x55e
> > > [ 279.759825] dw8250_probe+0x4dc/0x5b9
> > > [ 279.759825] platform_drv_probe+0x67/0x8b
> > > [ 279.759826] really_probe+0x14a/0x422
> > > [ 279.759826] driver_probe_device+0x66/0x130
> > > [ 279.759827] device_driver_attach+0x42/0x5b
> > > [ 279.759827] __driver_attach+0xca/0x139
> > > [ 279.759827] bus_for_each_dev+0x97/0xc9
> > > [ 279.759828] bus_add_driver+0x12b/0x228
> > > [ 279.759828] driver_register+0x64/0xed
> > > [ 279.759829] do_one_initcall+0x20c/0x4a6
> > > [ 279.759829] do_initcall_level+0xb5/0xc5
> > > [ 279.759829] do_basic_setup+0x4c/0x58
> > > [ 279.759830] kernel_init_freeable+0x13f/0x1dd
> > > [ 279.759830] kernel_init+0xe/0x105
> > > [ 279.759831] ret_from_fork+0x27/0x50
> > > [ 279.759831]
> > > [ 279.759831] -> #0 (console_owner){-...}:
> > > [ 279.759833] __lock_acquire+0x118d/0x2714
> > > [ 279.759833] lock_acquire+0x203/0x258
> > > [ 279.759834] console_lock_spinning_enable+0x51/0x57
> > > [ 279.759834] console_unlock+0x25d/0x528
> > > [ 279.759834] vprintk_emit+0x111/0x17f
> > > [ 279.759835] printk+0x59/0x73
> > > [ 279.759835] __report_bad_irq+0xa3/0xba
> > > [ 279.759836] note_interrupt+0x19a/0x1d6
> > > [ 279.759836] handle_irq_event_percpu+0x57/0x79
> > > [ 279.759836] handle_irq_event+0x36/0x55
> > > [ 279.759837] handle_fasteoi_irq+0xc2/0x18a
> > > [ 279.759837] do_IRQ+0xb3/0x157
> > > [ 279.759838] ret_from_intr+0x0/0x1d
> > > [ 279.759838] cpuidle_enter_state+0x12f/0x1fd
> > > [ 279.759838] cpuidle_enter+0x2e/0x3d
> > > [ 279.759839] do_idle+0x1ce/0x2ce
> > > [ 279.759839] cpu_startup_entry+0x1d/0x1f
> > > [ 279.759840] start_kernel+0x406/0x46a
> > > [ 279.759840] secondary_startup_64+0xa4/0xb0
>
> Hmm. So this is uart->port => desc->lock vs desc->lock => uart->port
>
> chain #1:
>
> serial8250_do_startup()
> spin_lock_irqsave(&port->lock);
> disable_irq_nosync(port->irq);
> raw_spin_lock_irqsave(&desc->lock)
>
> chain #2:
>
> __report_bad_irq()
> raw_spin_lock_irqsave(&desc->lock)
> for_each_action_of_desc()
> printk()
> spin_lock_irqsave(&port->lock);
>
>
> Breaking up chain #2 is not an option, I suppose. Those are a rather
> important KERN_ERR messages, printk_deferred() will upset people badly.
Yes, we should avoid printk_deferred() unless there is another solution.
> So... Do we need to hold uart->port when we disable port->irq? What do we
> race with? Module removal? The function bumps device PM counter (albeit
> for UART_CAP_RPM ports only).
Honestly, I do not see where a PM counter gets incremented.
Anyway, __disable_irq_nosync() does nothing when
irq_get_desc_buslock() returns NULL. And irq_get_desc_buslock()
takes desc->lock when desc exist. This should be enough to
synchronize any calls.
> But, at the same time, we do a whole bunch
> of unprotected port->FOO accesses in serial8250_do_startup(). We even set
> the IRQF_SHARED up->port.irqflags without grabbing the port->lock:
>
> up->port.irqflags |= IRQF_SHARED;
> spin_lock_irqsave(&port->lock, flags);
> if (up->port.irqflags & IRQF_SHARED)
> disable_irq_nosync(port->irq);
Yup, this looks suspicious. We set a flag in port.irqflags and take the lock
only when the flag was set. Either everything needs to be done under
the lock or the lock is not needed.
Well, I might have missed something. I do not fully understand meaning
and relation of all the structures.
Anyway, I believe that this is a false positive. If I get it correctly
serial8250_do_startup() must be called before the serial port could
be registered as a console. It means that it could not be called
from inside printk().
Sigh, I do not know how to tell lockdep about these false positives.
And I am never 100% sure that we could shuffle locks in the various
the console drivers.
> IOW, can we do something like this?
>
> ---
> drivers/tty/serial/8250/8250_port.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index d64ca77d9cfa..ad30991e1b3b 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -2275,6 +2275,11 @@ int serial8250_do_startup(struct uart_port *port)
>
> if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> unsigned char iir1;
> + bool irq_shared = up->port.irqflags & IRQF_SHARED;
> +
> + if (irq_shared)
> + disable_irq_nosync(port->irq);
> +
> /*
> * Test for UARTs that do not reassert THRE when the
> * transmitter is idle and the interrupt has already
> @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
> * allow register changes to become visible.
> */
> spin_lock_irqsave(&port->lock, flags);
> - if (up->port.irqflags & IRQF_SHARED)
> - disable_irq_nosync(port->irq);
>
> wait_for_xmitr(up, UART_LSR_THRE);
> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> @@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
> iir = serial_port_in(port, UART_IIR);
> serial_port_out(port, UART_IER, 0);
>
> - if (port->irqflags & IRQF_SHARED)
> - enable_irq(port->irq);
> spin_unlock_irqrestore(&port->lock, flags);
> + if (irq_shared)
> + enable_irq(port->irq);
>
> /*
> * If the interrupt is not reasserted, or we otherwise
I think that it might be safe but I am not 100% sure, sigh.
Best Regards,
Petr
On (20/06/30 12:21), Petr Mladek wrote:
> > So... Do we need to hold uart->port when we disable port->irq? What do we
> > race with? Module removal? The function bumps device PM counter (albeit
> > for UART_CAP_RPM ports only).
>
> Honestly, I do not see where a PM counter gets incremented.
serial8250_do_startup()
serial8250_rpm_get()
pm_runtime_get_sync(p->port.dev)
But this does not happen for all ports, just for UART_CAP_RPM ones.
> Anyway, __disable_irq_nosync() does nothing when
> irq_get_desc_buslock() returns NULL. And irq_get_desc_buslock()
> takes desc->lock when desc exist. This should be enough to
> synchronize any calls.
>
> > But, at the same time, we do a whole bunch
> > of unprotected port->FOO accesses in serial8250_do_startup(). We even set
> > the IRQF_SHARED up->port.irqflags without grabbing the port->lock:
> >
> > up->port.irqflags |= IRQF_SHARED;
> > spin_lock_irqsave(&port->lock, flags);
> > if (up->port.irqflags & IRQF_SHARED)
> > disable_irq_nosync(port->irq);
>
> Yup, this looks suspicious. We set a flag in port.irqflags and take the lock
> only when the flag was set. Either everything needs to be done under
> the lock or the lock is not needed.
>
> Well, I might have missed something. I do not fully understand meaning
> and relation of all the structures.
>
> Anyway, I believe that this is a false positive. If I get it correctly
> serial8250_do_startup() must be called before the serial port could
> be registered as a console. It means that it could not be called
> from inside printk().
From my understanding, I'm afraid we are talking about actual deadlock
here, not about false positive report. Quoting the original email:
: We are trying an S3 suspend stress test and occasionally while
: entering S3 we get a console deadlock.
[..]
> > drivers/tty/serial/8250/8250_port.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> > index d64ca77d9cfa..ad30991e1b3b 100644
> > --- a/drivers/tty/serial/8250/8250_port.c
> > +++ b/drivers/tty/serial/8250/8250_port.c
> > @@ -2275,6 +2275,11 @@ int serial8250_do_startup(struct uart_port *port)
> >
> > if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> > unsigned char iir1;
> > + bool irq_shared = up->port.irqflags & IRQF_SHARED;
> > +
> > + if (irq_shared)
> > + disable_irq_nosync(port->irq);
> > +
> > /*
> > * Test for UARTs that do not reassert THRE when the
> > * transmitter is idle and the interrupt has already
> > @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
> > * allow register changes to become visible.
> > */
> > spin_lock_irqsave(&port->lock, flags);
> > - if (up->port.irqflags & IRQF_SHARED)
> > - disable_irq_nosync(port->irq);
> >
> > wait_for_xmitr(up, UART_LSR_THRE);
> > serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> > @@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
> > iir = serial_port_in(port, UART_IIR);
> > serial_port_out(port, UART_IER, 0);
> >
> > - if (port->irqflags & IRQF_SHARED)
> > - enable_irq(port->irq);
> > spin_unlock_irqrestore(&port->lock, flags);
> > + if (irq_shared)
> > + enable_irq(port->irq);
> >
> > /*
> > * If the interrupt is not reasserted, or we otherwise
>
> I think that it might be safe but I am not 100% sure, sigh.
Yeah, I'm not 100%, but I'd give it a try.
-ss
On Tue, Jun 30, 2020 at 07:55:12PM +0900, Sergey Senozhatsky wrote:
> On (20/06/30 12:21), Petr Mladek wrote:
...
> > I think that it might be safe but I am not 100% sure, sigh.
>
> Yeah, I'm not 100%, but I'd give it a try.
Thanks for the patch! I think we have to wait for the original reporter.
Raul, can you test the change Sergey proposed in the thread?
--
With Best Regards,
Andy Shevchenko
On Tue 2020-06-30 19:55:12, Sergey Senozhatsky wrote:
> On (20/06/30 12:21), Petr Mladek wrote:
> > Anyway, I believe that this is a false positive. If I get it correctly
> > serial8250_do_startup() must be called before the serial port could
> > be registered as a console. It means that it could not be called
> > from inside printk().
>
> >From my understanding, I'm afraid we are talking about actual deadlock
> here, not about false positive report. Quoting the original email:
IMHO, it is not clear. The log contains lockdep report. It does not mean
the deadlock really happened. The lockdep reports cyclic dependency but it
is not able to say if the related code chains might be called at the
same time in reality.
And this combination has signs of false positives. As I said, I think
that the serial console could not be registered by printk() before
serial8250_do_startup() has finished.
Also there are some unrelated messages after the last backtrace.
Finally. we would be able to see the messages only when panic() was
called and oops_in_progress set. And there is nothing about panic().
> : We are trying an S3 suspend stress test and occasionally while
> : entering S3 we get a console deadlock.
Raul, have the system really died or have you just seen the lockdep
report, please?
> [..]
>
> > > drivers/tty/serial/8250/8250_port.c | 11 +++++++----
> > > 1 file changed, 7 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> > > index d64ca77d9cfa..ad30991e1b3b 100644
> > > --- a/drivers/tty/serial/8250/8250_port.c
> > > +++ b/drivers/tty/serial/8250/8250_port.c
> > > @@ -2275,6 +2275,11 @@ int serial8250_do_startup(struct uart_port *port)
> > >
> > > if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> > > unsigned char iir1;
> > > + bool irq_shared = up->port.irqflags & IRQF_SHARED;
> > > +
> > > + if (irq_shared)
> > > + disable_irq_nosync(port->irq);
> > > +
> > > /*
> > > * Test for UARTs that do not reassert THRE when the
> > > * transmitter is idle and the interrupt has already
> > > @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
> > > * allow register changes to become visible.
> > > */
> > > spin_lock_irqsave(&port->lock, flags);
> > > - if (up->port.irqflags & IRQF_SHARED)
> > > - disable_irq_nosync(port->irq);
> > >
> > > wait_for_xmitr(up, UART_LSR_THRE);
> > > serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> > > @@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
> > > iir = serial_port_in(port, UART_IIR);
> > > serial_port_out(port, UART_IER, 0);
> > >
> > > - if (port->irqflags & IRQF_SHARED)
> > > - enable_irq(port->irq);
> > > spin_unlock_irqrestore(&port->lock, flags);
> > > + if (irq_shared)
> > > + enable_irq(port->irq);
> > >
> > > /*
> > > * If the interrupt is not reasserted, or we otherwise
> >
> > I think that it might be safe but I am not 100% sure, sigh.
>
> Yeah, I'm not 100%, but I'd give it a try.
I do not feel brave enough to ack it today. But I am all for trying it
if anyone more familiar with the code is fine with it.
Best Regards,
Petr
On (20/06/30 14:22), Petr Mladek wrote:
> On Tue 2020-06-30 19:55:12, Sergey Senozhatsky wrote:
> > On (20/06/30 12:21), Petr Mladek wrote:
> > > Anyway, I believe that this is a false positive. If I get it correctly
> > > serial8250_do_startup() must be called before the serial port could
> > > be registered as a console. It means that it could not be called
> > > from inside printk().
> >
> > >From my understanding, I'm afraid we are talking about actual deadlock
> > here, not about false positive report. Quoting the original email:
>
> IMHO, it is not clear.
>
Yea. Maybe it is, maybe it's not.
Regardless, after lockdep tells us there is something it doesn't like,
it disables itself, so false positives render lockdep useless.
> > > > @@ -2275,6 +2275,11 @@ int serial8250_do_startup(struct uart_port *port)
> > > >
> > > > if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> > > > unsigned char iir1;
> > > > + bool irq_shared = up->port.irqflags & IRQF_SHARED;
> > > > +
> > > > + if (irq_shared)
> > > > + disable_irq_nosync(port->irq);
> > > > +
> > > > /*
> > > > * Test for UARTs that do not reassert THRE when the
> > > > * transmitter is idle and the interrupt has already
> > > > @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
> > > > * allow register changes to become visible.
> > > > */
> > > > spin_lock_irqsave(&port->lock, flags);
> > > > - if (up->port.irqflags & IRQF_SHARED)
> > > > - disable_irq_nosync(port->irq);
> > > >
> > > > wait_for_xmitr(up, UART_LSR_THRE);
> > > > serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> > > > @@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
> > > > iir = serial_port_in(port, UART_IIR);
> > > > serial_port_out(port, UART_IER, 0);
> > > >
> > > > - if (port->irqflags & IRQF_SHARED)
> > > > - enable_irq(port->irq);
> > > > spin_unlock_irqrestore(&port->lock, flags);
> > > > + if (irq_shared)
> > > > + enable_irq(port->irq);
> > > >
> > > > /*
> > > > * If the interrupt is not reasserted, or we otherwise
> > >
> > > I think that it might be safe but I am not 100% sure, sigh.
> >
> > Yeah, I'm not 100%, but I'd give it a try.
>
> I do not feel brave enough to ack it today. But I am all for trying it
> if anyone more familiar with the code is fine with it.
I see. Well, I suppose we need Ack-s from tty/serial/8250 maintainers.
I would not be very happy if _only_ printk people Ack the patch.
-ss
* Sergey Senozhatsky <[email protected]> [200630 13:06]:
> On (20/06/30 14:22), Petr Mladek wrote:
...
> > > > > @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
> > > > > * allow register changes to become visible.
> > > > > */
> > > > > spin_lock_irqsave(&port->lock, flags);
> > > > > - if (up->port.irqflags & IRQF_SHARED)
> > > > > - disable_irq_nosync(port->irq);
> > > > >
> > > > > wait_for_xmitr(up, UART_LSR_THRE);
> > > > > serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> > > > > @@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
> > > > > iir = serial_port_in(port, UART_IIR);
> > > > > serial_port_out(port, UART_IER, 0);
> > > > >
> > > > > - if (port->irqflags & IRQF_SHARED)
> > > > > - enable_irq(port->irq);
> > > > > spin_unlock_irqrestore(&port->lock, flags);
> > > > > + if (irq_shared)
> > > > > + enable_irq(port->irq);
> > > > >
> > > > > /*
> > > > > * If the interrupt is not reasserted, or we otherwise
> > > >
> > > > I think that it might be safe but I am not 100% sure, sigh.
> > >
> > > Yeah, I'm not 100%, but I'd give it a try.
> >
> > I do not feel brave enough to ack it today. But I am all for trying it
> > if anyone more familiar with the code is fine with it.
>
> I see. Well, I suppose we need Ack-s from tty/serial/8250 maintainers.
> I would not be very happy if _only_ printk people Ack the patch.
This conditional disable for irq_shared does not look nice to me
from the other device point of view :)
Would it be possible to just set up te dummy interrupt handler
for the startup, then change it back afterwards? See for example
omap8250_no_handle_irq().
The other device for irq_shared should be capable of dealing
with spurious interrupts if it's shared.
Regards,
Tony
On 6/30/2020 11:32 PM, Tony Lindgren wrote:
> * Sergey Senozhatsky <[email protected]> [200630 13:06]:
>> On (20/06/30 14:22), Petr Mladek wrote:
> ...
>
>>>>>> @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
>>>>>> * allow register changes to become visible.
>>>>>> */
>>>>>> spin_lock_irqsave(&port->lock, flags);
>>>>>> - if (up->port.irqflags & IRQF_SHARED)
>>>>>> - disable_irq_nosync(port->irq);
>>>>>>
>>>>>> wait_for_xmitr(up, UART_LSR_THRE);
>>>>>> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
>>>>>> @@ -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
>>>>>> iir = serial_port_in(port, UART_IIR);
>>>>>> serial_port_out(port, UART_IER, 0);
>>>>>>
>>>>>> - if (port->irqflags & IRQF_SHARED)
>>>>>> - enable_irq(port->irq);
>>>>>> spin_unlock_irqrestore(&port->lock, flags);
>>>>>> + if (irq_shared)
>>>>>> + enable_irq(port->irq);
>>>>>>
>>>>>> /*
>>>>>> * If the interrupt is not reasserted, or we otherwise
>>>>> I think that it might be safe but I am not 100% sure, sigh.
>>>> Yeah, I'm not 100%, but I'd give it a try.
>>> I do not feel brave enough to ack it today. But I am all for trying it
>>> if anyone more familiar with the code is fine with it.
>> I see. Well, I suppose we need Ack-s from tty/serial/8250 maintainers.
>> I would not be very happy if _only_ printk people Ack the patch.
FWIW, the lockdep trace is not seen anymore with the patch applied.
Regards,
Shirish S
> This conditional disable for irq_shared does not look nice to me
> from the other device point of view :)
>
> Would it be possible to just set up te dummy interrupt handler
> for the startup, then change it back afterwards? See for example
> omap8250_no_handle_irq().
>
> The other device for irq_shared should be capable of dealing
> with spurious interrupts if it's shared.
>
> Regards,
>
> Tony
--
Regards,
Shirish S
Hi All,
Can we land this patch upstream?
Feel free to add my tested-by.
Thanks.
Regards,
Shirish S
-----Original Message-----
From: S, Shirish <[email protected]>
Sent: Wednesday, July 1, 2020 12:15 PM
To: Tony Lindgren <[email protected]>; Sergey Senozhatsky <[email protected]>
Cc: Petr Mladek <[email protected]>; Andy Shevchenko <[email protected]>; Raul Rangel <[email protected]>; Sergey Senozhatsky <[email protected]>; linux-kernel <[email protected]>; Greg Kroah-Hartman <[email protected]>; Andy Shevchenko <[email protected]>; [email protected]; S, Shirish <[email protected]>; Peter Zijlstra <[email protected]>; John Ogness <[email protected]>; Steven Rostedt <[email protected]>
Subject: Re: UART/TTY console deadlock
On 6/30/2020 11:32 PM, Tony Lindgren wrote:
> * Sergey Senozhatsky <[email protected]> [200630 13:06]:
>> On (20/06/30 14:22), Petr Mladek wrote:
> ...
>
>>>>>> @@ -2284,8 +2289,6 @@ int serial8250_do_startup(struct uart_port *port)
>>>>>> * allow register changes to become visible.
>>>>>> */
>>>>>> spin_lock_irqsave(&port->lock, flags);
>>>>>> - if (up->port.irqflags & IRQF_SHARED)
>>>>>> - disable_irq_nosync(port->irq);
>>>>>>
>>>>>> wait_for_xmitr(up, UART_LSR_THRE);
>>>>>> serial_port_out_sync(port, UART_IER, UART_IER_THRI); @@
>>>>>> -2297,9 +2300,9 @@ int serial8250_do_startup(struct uart_port *port)
>>>>>> iir = serial_port_in(port, UART_IIR);
>>>>>> serial_port_out(port, UART_IER, 0);
>>>>>>
>>>>>> - if (port->irqflags & IRQF_SHARED)
>>>>>> - enable_irq(port->irq);
>>>>>> spin_unlock_irqrestore(&port->lock, flags);
>>>>>> + if (irq_shared)
>>>>>> + enable_irq(port->irq);
>>>>>>
>>>>>> /*
>>>>>> * If the interrupt is not reasserted, or we otherwise
>>>>> I think that it might be safe but I am not 100% sure, sigh.
>>>> Yeah, I'm not 100%, but I'd give it a try.
>>> I do not feel brave enough to ack it today. But I am all for trying
>>> it if anyone more familiar with the code is fine with it.
>> I see. Well, I suppose we need Ack-s from tty/serial/8250 maintainers.
>> I would not be very happy if _only_ printk people Ack the patch.
FWIW, the lockdep trace is not seen anymore with the patch applied.
Regards,
Shirish S
> This conditional disable for irq_shared does not look nice to me from
> the other device point of view :)
>
> Would it be possible to just set up te dummy interrupt handler for the
> startup, then change it back afterwards? See for example
> omap8250_no_handle_irq().
>
> The other device for irq_shared should be capable of dealing with
> spurious interrupts if it's shared.
>
> Regards,
>
> Tony
--
Regards,
Shirish S
On (20/06/30 11:02), Tony Lindgren wrote:
> This conditional disable for irq_shared does not look nice to me
> from the other device point of view :)
>
> Would it be possible to just set up te dummy interrupt handler
> for the startup, then change it back afterwards? See for example
> omap8250_no_handle_irq().
I think we can do it. serial8250_do_startup() and irq handler take
port->lock, so they should be synchronized.
Something like this then?
---
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index f77bf820b7a3..61ad6f5f0f88 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2140,6 +2140,11 @@ static void serial8250_put_poll_char(struct uart_port *port,
#endif /* CONFIG_CONSOLE_POLL */
+static int serial8250_dummy_handle_irq(struct uart_port *port)
+{
+ return 0;
+}
+
int serial8250_do_startup(struct uart_port *port)
{
struct uart_8250_port *up = up_to_u8250p(port);
@@ -2271,6 +2276,9 @@ int serial8250_do_startup(struct uart_port *port)
if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
unsigned char iir1;
+ int (*handler)(struct uart_port *);
+
+ handler = port->handle_irq;
/*
* Test for UARTs that do not reassert THRE when the
* transmitter is idle and the interrupt has already
@@ -2280,8 +2288,7 @@ int serial8250_do_startup(struct uart_port *port)
* allow register changes to become visible.
*/
spin_lock_irqsave(&port->lock, flags);
- if (up->port.irqflags & IRQF_SHARED)
- disable_irq_nosync(port->irq);
+ port->handle_irq = serial8250_dummy_handle_irq;
wait_for_xmitr(up, UART_LSR_THRE);
serial_port_out_sync(port, UART_IER, UART_IER_THRI);
@@ -2293,8 +2300,7 @@ int serial8250_do_startup(struct uart_port *port)
iir = serial_port_in(port, UART_IIR);
serial_port_out(port, UART_IER, 0);
- if (port->irqflags & IRQF_SHARED)
- enable_irq(port->irq);
+ port->handle_irq = handler;
spin_unlock_irqrestore(&port->lock, flags);
/*
On (20/07/02 14:12), Sergey Senozhatsky wrote:
> Date: Thu, 2 Jul 2020 14:12:13 +0900
> From: Sergey Senozhatsky <[email protected]>
> To: Tony Lindgren <[email protected]>
> Cc: Sergey Senozhatsky <[email protected]>, Petr Mladek
> <[email protected]>, Andy Shevchenko <[email protected]>, Raul
> Rangel <[email protected]>, Sergey Senozhatsky
> <[email protected]>, linux-kernel
> <[email protected]>, Greg Kroah-Hartman
> <[email protected]>, Andy Shevchenko
> <[email protected]>, [email protected], "S, Shirish"
> <[email protected]>, Peter Zijlstra <[email protected]>, John Ogness
> <[email protected]>, Steven Rostedt <[email protected]>
> Subject: Re: UART/TTY console deadlock
> Message-ID: <[email protected]>
>
> On (20/06/30 11:02), Tony Lindgren wrote:
> > This conditional disable for irq_shared does not look nice to me
> > from the other device point of view :)
> >
> > Would it be possible to just set up te dummy interrupt handler
> > for the startup, then change it back afterwards? See for example
> > omap8250_no_handle_irq().
>
> I think we can do it. serial8250_do_startup() and irq handler take
> port->lock, so they should be synchronized.
Hmm, hold on. Why does it disable IRQ in the first place? IRQ handlers
should grab the port->lock. So if there is already running IRQ, then
serial8250_do_startup() will wait until IRQ handler unlocks the port->lock.
If serial8250_do_startup() grabs the port->lock first, then IRQ will wait
for serial8250_do_startup() to unlock it. serial8250_do_startup() does
not release the port->unlock until its done:
spin_lock_irqsave(&port->lock, flags);
wait_for_xmitr(up, UART_LSR_THRE);
serial_port_out_sync(port, UART_IER, UART_IER_THRI);
udelay(1); /* allow THRE to set */
iir1 = serial_port_in(port, UART_IIR);
serial_port_out(port, UART_IER, 0);
serial_port_out_sync(port, UART_IER, UART_IER_THRI);
udelay(1); /* allow a working UART time to re-assert THRE */
iir = serial_port_in(port, UART_IIR);
serial_port_out(port, UART_IER, 0);
spin_unlock_irqrestore(&port->lock, flags);
so IRQ will not see the inconsistent device state.
What exactly is the purpose of disable_irq_nosync()? Can we just remove
disable_irq_nosync()/enable_irq() instead? Are there any IRQ handlers
that don't acquire the port->lock?
-ss
On Thu, Jul 02, 2020 at 03:48:43AM +0000, S, Shirish wrote:
> Hi All,
>
> Can we land this patch upstream?
What patch?
> Feel free to add my tested-by.
Please send it in a form that I can add it, to the patch in question.
thanks,
greg k-h
This one:
https://lkml.org/lkml/2020/6/30/394
I did reply to all, not sure what I am missing while replying.
Regards,
Shirish S
-----Original Message-----
From: Greg Kroah-Hartman <[email protected]>
Sent: Thursday, July 2, 2020 11:42 AM
To: S, Shirish <[email protected]>
Cc: Tony Lindgren <[email protected]>; Sergey Senozhatsky <[email protected]>; Petr Mladek <[email protected]>; Andy Shevchenko <[email protected]>; Raul Rangel <[email protected]>; Sergey Senozhatsky <[email protected]>; linux-kernel <[email protected]>; Andy Shevchenko <[email protected]>; [email protected]; Peter Zijlstra <[email protected]>; John Ogness <[email protected]>; Steven Rostedt <[email protected]>
Subject: Re: UART/TTY console deadlock
On Thu, Jul 02, 2020 at 03:48:43AM +0000, S, Shirish wrote:
> Hi All,
>
> Can we land this patch upstream?
What patch?
> Feel free to add my tested-by.
Please send it in a form that I can add it, to the patch in question.
thanks,
greg k-h
On (20/07/02 06:14), S, Shirish wrote:
> This one:
> https://lkml.org/lkml/2020/6/30/394
>
> I did reply to all, not sure what I am missing while replying.
It's not a formal patch yet. Give us some more time to figure out
how do we want to fix the problem at hand.
-ss
A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
A: No.
Q: Should I include quotations after my reply?
http://daringfireball.net/2007/07/on_top
On Thu, Jul 02, 2020 at 06:14:05AM +0000, S, Shirish wrote:
> This one:
> https://lkml.org/lkml/2020/6/30/394
Please use lore.kernel.org for links, lkml.org is known for having
reliability issues and is not under our control.
> I did reply to all, not sure what I am missing while replying.
Proper email quoting?
thanks,
greg k-h
On Thu, Jul 02, 2020 at 02:40:33PM +0900, Sergey Senozhatsky wrote:
> On (20/07/02 14:12), Sergey Senozhatsky wrote:
> > From: Sergey Senozhatsky <[email protected]>
> > On (20/06/30 11:02), Tony Lindgren wrote:
...
> > I think we can do it. serial8250_do_startup() and irq handler take
> > port->lock, so they should be synchronized.
>
> Hmm, hold on. Why does it disable IRQ in the first place? IRQ handlers
> should grab the port->lock. So if there is already running IRQ, then
> serial8250_do_startup() will wait until IRQ handler unlocks the port->lock.
> If serial8250_do_startup() grabs the port->lock first, then IRQ will wait
> for serial8250_do_startup() to unlock it. serial8250_do_startup() does
> not release the port->unlock until its done:
>
> spin_lock_irqsave(&port->lock, flags);
>
> wait_for_xmitr(up, UART_LSR_THRE);
> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> udelay(1); /* allow THRE to set */
> iir1 = serial_port_in(port, UART_IIR);
> serial_port_out(port, UART_IER, 0);
> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> udelay(1); /* allow a working UART time to re-assert THRE */
> iir = serial_port_in(port, UART_IIR);
> serial_port_out(port, UART_IER, 0);
>
> spin_unlock_irqrestore(&port->lock, flags);
>
> so IRQ will not see the inconsistent device state.
>
> What exactly is the purpose of disable_irq_nosync()? Can we just remove
> disable_irq_nosync()/enable_irq() instead? Are there any IRQ handlers
> that don't acquire the port->lock?
I didn't look into this deeply, but my understanding that this is something for
special case when you have several UART ports sharing the IRQ (multi-port card)
and IRQ even maybe undesirable b/c it will confuse real IRQ handler. I don't
remember details, but AFAIR IRQ handler does a busyloop to service as much as
possible and in between it may release the lock (again, multi-port UART cards),
that's why we better avoid IRQ event in the first place.
But it's my pure speculation.
--
With Best Regards,
Andy Shevchenko
* Sergey Senozhatsky <[email protected]> [200702 05:13]:
> On (20/06/30 11:02), Tony Lindgren wrote:
> > This conditional disable for irq_shared does not look nice to me
> > from the other device point of view :)
> >
> > Would it be possible to just set up te dummy interrupt handler
> > for the startup, then change it back afterwards? See for example
> > omap8250_no_handle_irq().
>
> I think we can do it. serial8250_do_startup() and irq handler take
> port->lock, so they should be synchronized.
>
> Something like this then?
Yeah thanks this should work better for shared irq.
Regards,
Tony
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index f77bf820b7a3..61ad6f5f0f88 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -2140,6 +2140,11 @@ static void serial8250_put_poll_char(struct uart_port *port,
>
> #endif /* CONFIG_CONSOLE_POLL */
>
> +static int serial8250_dummy_handle_irq(struct uart_port *port)
> +{
> + return 0;
> +}
> +
> int serial8250_do_startup(struct uart_port *port)
> {
> struct uart_8250_port *up = up_to_u8250p(port);
> @@ -2271,6 +2276,9 @@ int serial8250_do_startup(struct uart_port *port)
>
> if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> unsigned char iir1;
> + int (*handler)(struct uart_port *);
> +
> + handler = port->handle_irq;
> /*
> * Test for UARTs that do not reassert THRE when the
> * transmitter is idle and the interrupt has already
> @@ -2280,8 +2288,7 @@ int serial8250_do_startup(struct uart_port *port)
> * allow register changes to become visible.
> */
> spin_lock_irqsave(&port->lock, flags);
> - if (up->port.irqflags & IRQF_SHARED)
> - disable_irq_nosync(port->irq);
> + port->handle_irq = serial8250_dummy_handle_irq;
>
> wait_for_xmitr(up, UART_LSR_THRE);
> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> @@ -2293,8 +2300,7 @@ int serial8250_do_startup(struct uart_port *port)
> iir = serial_port_in(port, UART_IIR);
> serial_port_out(port, UART_IER, 0);
>
> - if (port->irqflags & IRQF_SHARED)
> - enable_irq(port->irq);
> + port->handle_irq = handler;
> spin_unlock_irqrestore(&port->lock, flags);
>
> /*
On (20/07/02 09:05), Tony Lindgren wrote:
> * Sergey Senozhatsky <[email protected]> [200702 05:13]:
> > On (20/06/30 11:02), Tony Lindgren wrote:
> > > This conditional disable for irq_shared does not look nice to me
> > > from the other device point of view :)
> > >
> > > Would it be possible to just set up te dummy interrupt handler
> > > for the startup, then change it back afterwards? See for example
> > > omap8250_no_handle_irq().
> >
> > I think we can do it. serial8250_do_startup() and irq handler take
> > port->lock, so they should be synchronized.
> >
> > Something like this then?
>
> Yeah thanks this should work better for shared irq.
This is, basically, an equivalent of
---
drivers/tty/serial/8250/8250_port.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index d64ca77d9cfa..dba7747d2ddd 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2275,6 +2275,7 @@ int serial8250_do_startup(struct uart_port *port)
if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
unsigned char iir1;
+
/*
* Test for UARTs that do not reassert THRE when the
* transmitter is idle and the interrupt has already
@@ -2284,8 +2285,6 @@ int serial8250_do_startup(struct uart_port *port)
* allow register changes to become visible.
*/
spin_lock_irqsave(&port->lock, flags);
- if (up->port.irqflags & IRQF_SHARED)
- disable_irq_nosync(port->irq);
wait_for_xmitr(up, UART_LSR_THRE);
serial_port_out_sync(port, UART_IER, UART_IER_THRI);
@@ -2297,8 +2296,6 @@ int serial8250_do_startup(struct uart_port *port)
iir = serial_port_in(port, UART_IIR);
serial_port_out(port, UART_IER, 0);
- if (port->irqflags & IRQF_SHARED)
- enable_irq(port->irq);
spin_unlock_irqrestore(&port->lock, flags);
/*
--
2.27.0
On (20/07/02 11:20), Andy Shevchenko wrote:
>
> I didn't look into this deeply, but my understanding that this is something for
> special case when you have several UART ports sharing the IRQ (multi-port card)
> and IRQ even maybe undesirable b/c it will confuse real IRQ handler. I don't
> remember details, but AFAIR IRQ handler does a busyloop to service as much as
> possible and in between it may release the lock (again, multi-port UART cards),
> that's why we better avoid IRQ event in the first place.
> But it's my pure speculation.
Hmm. Would different IRQ handlers use different UART port structures? ->irq
might be the only thing they share. Even if they use the same port, we
keep IRQs disabled on local CPU, and UART port locked. To unlock the UART
port during IRQ, the handler (on another CPU) first needs to acquire the
port->lock, which is locked by serial8250_do_startup().
-ss
On Fri, Jul 3, 2020 at 1:32 PM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (20/07/02 09:05), Tony Lindgren wrote:
> > * Sergey Senozhatsky <[email protected]> [200702 05:13]:
> > > On (20/06/30 11:02), Tony Lindgren wrote:
> > > > This conditional disable for irq_shared does not look nice to me
> > > > from the other device point of view :)
> > > >
> > > > Would it be possible to just set up te dummy interrupt handler
> > > > for the startup, then change it back afterwards? See for example
> > > > omap8250_no_handle_irq().
> > >
> > > I think we can do it. serial8250_do_startup() and irq handler take
> > > port->lock, so they should be synchronized.
> > >
> > > Something like this then?
> >
> > Yeah thanks this should work better for shared irq.
>
> This is, basically, an equivalent of
>
> ---
> drivers/tty/serial/8250/8250_port.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index d64ca77d9cfa..dba7747d2ddd 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -2275,6 +2275,7 @@ int serial8250_do_startup(struct uart_port *port)
>
> if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> unsigned char iir1;
> +
> /*
> * Test for UARTs that do not reassert THRE when the
> * transmitter is idle and the interrupt has already
> @@ -2284,8 +2285,6 @@ int serial8250_do_startup(struct uart_port *port)
> * allow register changes to become visible.
> */
> spin_lock_irqsave(&port->lock, flags);
> - if (up->port.irqflags & IRQF_SHARED)
> - disable_irq_nosync(port->irq);
>
> wait_for_xmitr(up, UART_LSR_THRE);
> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> @@ -2297,8 +2296,6 @@ int serial8250_do_startup(struct uart_port *port)
> iir = serial_port_in(port, UART_IIR);
> serial_port_out(port, UART_IER, 0);
>
> - if (port->irqflags & IRQF_SHARED)
> - enable_irq(port->irq);
> spin_unlock_irqrestore(&port->lock, flags);
>
> /*
...which effectively is a revert of
768aec0b5bcc ("serial: 8250: fix shared interrupts issues with SMP and
RT kernels")
--
With Best Regards,
Andy Shevchenko
On Fri, Jul 3, 2020 at 1:53 PM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (20/07/02 11:20), Andy Shevchenko wrote:
> >
> > I didn't look into this deeply, but my understanding that this is something for
> > special case when you have several UART ports sharing the IRQ (multi-port card)
> > and IRQ even maybe undesirable b/c it will confuse real IRQ handler. I don't
> > remember details, but AFAIR IRQ handler does a busyloop to service as much as
> > possible and in between it may release the lock (again, multi-port UART cards),
> > that's why we better avoid IRQ event in the first place.
> > But it's my pure speculation.
>
> Hmm. Would different IRQ handlers use different UART port structures? ->irq
> might be the only thing they share. Even if they use the same port, we
> keep IRQs disabled on local CPU, and UART port locked. To unlock the UART
> port during IRQ, the handler (on another CPU) first needs to acquire the
> port->lock, which is locked by serial8250_do_startup().
It appears to be a RT kernel related. Don't know the state of affairs
in RT nowadays, but maybe still relevant.
See the commit 768aec0b5bcc ("serial: 8250: fix shared interrupts
issues with SMP and RT kernels").
--
With Best Regards,
Andy Shevchenko
On Sat, Jul 04, 2020 at 02:35:46PM +0300, Andy Shevchenko wrote:
> On Fri, Jul 3, 2020 at 1:32 PM Sergey Senozhatsky
> <[email protected]> wrote:
> > On (20/07/02 09:05), Tony Lindgren wrote:
> > > * Sergey Senozhatsky <[email protected]> [200702 05:13]:
> > > > On (20/06/30 11:02), Tony Lindgren wrote:
...
> > This is, basically, an equivalent of
> >
> > ---
> > drivers/tty/serial/8250/8250_port.c | 5 +----
> > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> > index d64ca77d9cfa..dba7747d2ddd 100644
> > --- a/drivers/tty/serial/8250/8250_port.c
> > +++ b/drivers/tty/serial/8250/8250_port.c
> > @@ -2275,6 +2275,7 @@ int serial8250_do_startup(struct uart_port *port)
> >
> > if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> > unsigned char iir1;
> > +
> > /*
> > * Test for UARTs that do not reassert THRE when the
> > * transmitter is idle and the interrupt has already
> > @@ -2284,8 +2285,6 @@ int serial8250_do_startup(struct uart_port *port)
> > * allow register changes to become visible.
> > */
> > spin_lock_irqsave(&port->lock, flags);
> > - if (up->port.irqflags & IRQF_SHARED)
> > - disable_irq_nosync(port->irq);
> >
> > wait_for_xmitr(up, UART_LSR_THRE);
> > serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> > @@ -2297,8 +2296,6 @@ int serial8250_do_startup(struct uart_port *port)
> > iir = serial_port_in(port, UART_IIR);
> > serial_port_out(port, UART_IER, 0);
> >
> > - if (port->irqflags & IRQF_SHARED)
> > - enable_irq(port->irq);
> > spin_unlock_irqrestore(&port->lock, flags);
> >
> > /*
>
> ...which effectively is a revert of
>
> 768aec0b5bcc ("serial: 8250: fix shared interrupts issues with SMP and
> RT kernels")
(without c389d27b5e64 ("8250.c: port.lock is irq-safe") applied)
--
With Best Regards,
Andy Shevchenko
Hi,
On Sat Jul 04 2020, Andy Shevchenko wrote:
> On Fri, Jul 3, 2020 at 1:32 PM Sergey Senozhatsky
> <[email protected]> wrote:
>>
>> On (20/07/02 09:05), Tony Lindgren wrote:
>> > * Sergey Senozhatsky <[email protected]> [200702 05:13]:
>> > > On (20/06/30 11:02), Tony Lindgren wrote:
>> > > > This conditional disable for irq_shared does not look nice to me
>> > > > from the other device point of view :)
>> > > >
>> > > > Would it be possible to just set up te dummy interrupt handler
>> > > > for the startup, then change it back afterwards? See for example
>> > > > omap8250_no_handle_irq().
>> > >
>> > > I think we can do it. serial8250_do_startup() and irq handler take
>> > > port->lock, so they should be synchronized.
>> > >
>> > > Something like this then?
>> >
>> > Yeah thanks this should work better for shared irq.
>>
>> This is, basically, an equivalent of
>>
>> ---
>> drivers/tty/serial/8250/8250_port.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
>> index d64ca77d9cfa..dba7747d2ddd 100644
>> --- a/drivers/tty/serial/8250/8250_port.c
>> +++ b/drivers/tty/serial/8250/8250_port.c
>> @@ -2275,6 +2275,7 @@ int serial8250_do_startup(struct uart_port *port)
>>
>> if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
>> unsigned char iir1;
>> +
>> /*
>> * Test for UARTs that do not reassert THRE when the
>> * transmitter is idle and the interrupt has already
>> @@ -2284,8 +2285,6 @@ int serial8250_do_startup(struct uart_port *port)
>> * allow register changes to become visible.
>> */
>> spin_lock_irqsave(&port->lock, flags);
>> - if (up->port.irqflags & IRQF_SHARED)
>> - disable_irq_nosync(port->irq);
>>
>> wait_for_xmitr(up, UART_LSR_THRE);
>> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
>> @@ -2297,8 +2296,6 @@ int serial8250_do_startup(struct uart_port *port)
>> iir = serial_port_in(port, UART_IIR);
>> serial_port_out(port, UART_IER, 0);
>>
>> - if (port->irqflags & IRQF_SHARED)
>> - enable_irq(port->irq);
>> spin_unlock_irqrestore(&port->lock, flags);
>>
>> /*
>
> ...which effectively is a revert of
>
> 768aec0b5bcc ("serial: 8250: fix shared interrupts issues with SMP and
> RT kernels")
Please, don't revert that commit. I've faced the same issue as described
in the commit log. There is hardware available with shared UART
interrupt lines.
Thanks,
Kurt
On (20/07/06 13:31), Kurt Kanzenbach wrote:
> >> @@ -2275,6 +2275,7 @@ int serial8250_do_startup(struct uart_port *port)
> >>
> >> if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
> >> unsigned char iir1;
> >> +
> >> /*
> >> * Test for UARTs that do not reassert THRE when the
> >> * transmitter is idle and the interrupt has already
> >> @@ -2284,8 +2285,6 @@ int serial8250_do_startup(struct uart_port *port)
> >> * allow register changes to become visible.
> >> */
> >> spin_lock_irqsave(&port->lock, flags);
> >> - if (up->port.irqflags & IRQF_SHARED)
> >> - disable_irq_nosync(port->irq);
> >>
> >> wait_for_xmitr(up, UART_LSR_THRE);
> >> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
> >> @@ -2297,8 +2296,6 @@ int serial8250_do_startup(struct uart_port *port)
> >> iir = serial_port_in(port, UART_IIR);
> >> serial_port_out(port, UART_IER, 0);
> >>
> >> - if (port->irqflags & IRQF_SHARED)
> >> - enable_irq(port->irq);
> >> spin_unlock_irqrestore(&port->lock, flags);
> >>
> >> /*
> >
> > ...which effectively is a revert of
> >
> > 768aec0b5bcc ("serial: 8250: fix shared interrupts issues with SMP and
> > RT kernels")
>
> Please, don't revert that commit. I've faced the same issue as described
> in the commit log. There is hardware available with shared UART
> interrupt lines.
Will this patch break that hardware?
https://lore.kernel.org/lkml/[email protected]/
-ss
On Mon Jul 06 2020, Sergey Senozhatsky wrote:
> On (20/07/06 13:31), Kurt Kanzenbach wrote:
>> >> @@ -2275,6 +2275,7 @@ int serial8250_do_startup(struct uart_port *port)
>> >>
>> >> if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
>> >> unsigned char iir1;
>> >> +
>> >> /*
>> >> * Test for UARTs that do not reassert THRE when the
>> >> * transmitter is idle and the interrupt has already
>> >> @@ -2284,8 +2285,6 @@ int serial8250_do_startup(struct uart_port *port)
>> >> * allow register changes to become visible.
>> >> */
>> >> spin_lock_irqsave(&port->lock, flags);
>> >> - if (up->port.irqflags & IRQF_SHARED)
>> >> - disable_irq_nosync(port->irq);
>> >>
>> >> wait_for_xmitr(up, UART_LSR_THRE);
>> >> serial_port_out_sync(port, UART_IER, UART_IER_THRI);
>> >> @@ -2297,8 +2296,6 @@ int serial8250_do_startup(struct uart_port *port)
>> >> iir = serial_port_in(port, UART_IIR);
>> >> serial_port_out(port, UART_IER, 0);
>> >>
>> >> - if (port->irqflags & IRQF_SHARED)
>> >> - enable_irq(port->irq);
>> >> spin_unlock_irqrestore(&port->lock, flags);
>> >>
>> >> /*
>> >
>> > ...which effectively is a revert of
>> >
>> > 768aec0b5bcc ("serial: 8250: fix shared interrupts issues with SMP and
>> > RT kernels")
>>
>> Please, don't revert that commit. I've faced the same issue as described
>> in the commit log. There is hardware available with shared UART
>> interrupt lines.
>
> Will this patch break that hardware?
> https://lore.kernel.org/lkml/[email protected]/
I'm not sure how this patch will help with the situation. Because at the
point of that THRE test the irq handler isn't registered. It's
registered a few lines below (up->ops->setup_irq()) meaning the irq line
has to be disabled if shared. Otherwise the kernel might detect a
spurious irq and disables it. That's at least my understanding of the
problem (see commit message from 54e53b2e8081 ("tty: serial: 8250: pass
IRQ shared flag to UART ports")).
Thanks,
Kurt
On (20/07/08 09:40), Kurt Kanzenbach wrote:
> I'm not sure how this patch will help with the situation. Because at the
> point of that THRE test the irq handler isn't registered. It's
> registered a few lines below (up->ops->setup_irq()) meaning the irq line
> has to be disabled if shared. Otherwise the kernel might detect a
> spurious irq and disables it. That's at least my understanding of the
> problem (see commit message from 54e53b2e8081 ("tty: serial: 8250: pass
> IRQ shared flag to UART ports")).
So the only remaining approach then is to move
disable_irq_nosync()/enable_irq() out of port->lock
scope.
-ss
On Wed 2020-07-08 17:07:12, Sergey Senozhatsky wrote:
> On (20/07/08 09:40), Kurt Kanzenbach wrote:
> > I'm not sure how this patch will help with the situation. Because at the
> > point of that THRE test the irq handler isn't registered. It's
> > registered a few lines below (up->ops->setup_irq()) meaning the irq line
> > has to be disabled if shared. Otherwise the kernel might detect a
> > spurious irq and disables it. That's at least my understanding of the
> > problem (see commit message from 54e53b2e8081 ("tty: serial: 8250: pass
> > IRQ shared flag to UART ports")).
>
> So the only remaining approach then is to move
> disable_irq_nosync()/enable_irq() out of port->lock
> scope.
The ultimate solution is the offload of printk console handling to
kthreads. I know that it takes ages to get it in. But it will
also take ages to go around all the cyclic dependencies.
People tend to use printk() anywhere. And any lock taken by
printk() is prone to cause these problems.
We have tried to go around this in the scheduler code by
printk_deferred() and it was never-ending story. I believe
that the port->lock related cycles are another never-ending
story. And attempts to shuffle locks are always complicated
and error-prone. They require deep understanding of the affected
code paths. The approach with printk_deferred() has been pretty
safe at least.
Another approach might be to teach lockdep to remove lock chains
that were taken during the system initialization and will never
happen again. Or something like this.
I still believe that this is a false positive. Console that is being
initialized can't be used by printk() at the same moment. Locks taken
only when the console is being initialized can't cause deadlocks
when the fully initialized console is registered later.
Best Regards,
Petr
On (20/07/08 11:52), Petr Mladek wrote:
>
> I still believe that this is a false positive.
I've no problems with that. The thing that I'm trying to improve here is the
fact that from lockdep's point of view there are no false positives. lockdep
reports the problem and then waves good bye. It disables itself. There might
be some real suspend locking problems on some particular setups, but it's
hard to tell, because console startup blows up the lockdep first.
-ss
Hey everyone,
So it looks like this was a red herring. We were getting spurious
interrupts which I suspect caused the console code to get called in
some critical section. Once we fixed the spurious interrupts we no
longer see the issue. Sorry for the noise.
Thanks,
Raul
On Thu, Jul 9, 2020 at 7:22 AM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (20/07/08 11:52), Petr Mladek wrote:
> >
> > I still believe that this is a false positive.
>
> I've no problems with that. The thing that I'm trying to improve here is the
> fact that from lockdep's point of view there are no false positives. lockdep
> reports the problem and then waves good bye. It disables itself. There might
> be some real suspend locking problems on some particular setups, but it's
> hard to tell, because console startup blows up the lockdep first.
>
> -ss