2022-05-12 23:05:57

by Byungchul Park

[permalink] [raw]
Subject: [REPORT] syscall reboot + umh + firmware fallback

+cc [email protected] (firmware)
+cc [email protected] (syscall reboot)

Hi Luis, Robin and folks,

I'm developing a tool for lockup detection, DETP(Dependency Tracker).
I got a DEPT report from Hyeonggon - Thanks, Hyeonggon!

It doesn't mean the code *definitely* has a deadlock. However, it looks
problematic to me. So I'd like to ask things to see if it is actually.

Because Hyeonggon didn't run decode_stacktrace.sh before sending it to
me, I don't have a report having a better debugging information. But I
can explain it in this mail. The problematic scenario looks like:


PROCESS A PROCESS B WORKER C

__do_sys_reboot()
__do_sys_reboot()
mutex_lock(&system_transition_mutex)
... mutex_lock(&system_transition_mutex) <- stuck
...
request_firmware_work_func()
_request_firmware()
firmware_fallback_sysfs()
usermodehelper_read_lock_wait()
down_read(&umhelper_sem)
...
fw_load_sysfs_fallback()
fw_sysfs_wait_timeout()
wait_for_completion_killable_timeout(&fw_st->completion) <- stuck
kernel_halt()
__usermodehelper_disable()
down_write(&umhelper_sem) <- stuck

--------------------------------------------------------
All the 3 contexts are stuck at this point.
--------------------------------------------------------

PROCESS A PROCESS B WORKER C

...
up_write(&umhelper_sem)
...
mutex_unlock(&system_transition_mutex) <- cannot wake up B

...
kernel_halt()
notifier_call_chain()
hw_shutdown_notify()
kill_pending_fw_fallback_reqs()
__fw_load_abort()
complete_all(&fw_st->completion) <- cannot wake up C

...
usermodeheler_read_unlock()
up_read(&umhelper_sem) <- cannot wake up A


So I think this scenario is problematic. Or am I missing something here?
Or do you think it's okay because the wait_for_completion_*() has a
timeout? AFAIK, timeouts are not supposed to fire in normal cases.

It'd be appreciated if you share your opinion on the report.

Byungchul

---

[ 18.136012][ T1] ===================================================
[ 18.136419][ T1] DEPT: Circular dependency has been detected.
[ 18.136782][ T1] 5.18.0-rc3-57979-gc2b89afca919 #2374 Tainted: G B
[ 18.137249][ T1] ---------------------------------------------------
[ 18.137649][ T1] summary
[ 18.137823][ T1] ---------------------------------------------------
[ 18.138222][ T1] *** DEADLOCK ***
[ 18.138222][ T1]
[ 18.138569][ T1] context A
[ 18.138754][ T1] [S] __mutex_lock_common(system_transition_mutex:0)
[ 18.139170][ T1] [W] down_write(umhelper_sem:0)
[ 18.139482][ T1] [E] mutex_unlock(system_transition_mutex:0)
[ 18.139865][ T1]
[ 18.140004][ T1] context B
[ 18.140189][ T1] [S] (unknown)(&fw_st->completion:0)
[ 18.140527][ T1] [W] __mutex_lock_common(system_transition_mutex:0)
[ 18.140942][ T1] [E] complete_all(&fw_st->completion:0)
[ 18.141295][ T1]
[ 18.141434][ T1] context C
[ 18.141618][ T1] [S] down_read(umhelper_sem:0)
[ 18.141926][ T1] [W] wait_for_completion_killable_timeout(&fw_st->completion:0)
[ 18.142402][ T1] [E] up_read(umhelper_sem:0)
[ 18.142699][ T1]
[ 18.142837][ T1] [S]: start of the event context
[ 18.143134][ T1] [W]: the wait blocked
[ 18.143379][ T1] [E]: the event not reachable
[ 18.143661][ T1] ---------------------------------------------------
[ 18.144063][ T1] context A's detail
[ 18.144293][ T1] ---------------------------------------------------
[ 18.144691][ T1] context A
[ 18.144875][ T1] [S] __mutex_lock_common(system_transition_mutex:0)
[ 18.145290][ T1] [W] down_write(umhelper_sem:0)
[ 18.145602][ T1] [E] mutex_unlock(system_transition_mutex:0)
[ 18.145982][ T1]
[ 18.146120][ T1] [S] __mutex_lock_common(system_transition_mutex:0):
[ 18.146519][ T1] [<ffffffff810ee14c>] __do_sys_reboot+0x11f/0x24f
[ 18.146907][ T1] stacktrace:
[ 18.147101][ T1] __mutex_lock+0x1f3/0x3f3
[ 18.147396][ T1] __do_sys_reboot+0x11f/0x24f
[ 18.147706][ T1] do_syscall_64+0xd4/0xfb
[ 18.148001][ T1] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 18.148379][ T1]
[ 18.148517][ T1] [W] down_write(umhelper_sem:0):
[ 18.148815][ T1] [<ffffffff810d9c14>] __usermodehelper_disable+0x80/0x17f
[ 18.149243][ T1] stacktrace:
[ 18.149438][ T1] __dept_wait+0x115/0x15b
[ 18.149726][ T1] dept_wait+0xcd/0xf3
[ 18.149993][ T1] down_write+0x4e/0x82
[ 18.150266][ T1] __usermodehelper_disable+0x80/0x17f
[ 18.150615][ T1] kernel_halt+0x33/0x5d
[ 18.150893][ T1] __do_sys_reboot+0x197/0x24f
[ 18.151201][ T1] do_syscall_64+0xd4/0xfb
[ 18.151489][ T1] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 18.151866][ T1]
[ 18.152004][ T1] [E] mutex_unlock(system_transition_mutex:0):
[ 18.152368][ T1] (N/A)
[ 18.152532][ T1] ---------------------------------------------------
[ 18.152931][ T1] context B's detail
[ 18.153161][ T1] ---------------------------------------------------
[ 18.153559][ T1] context B
[ 18.153743][ T1] [S] (unknown)(&fw_st->completion:0)
[ 18.154082][ T1] [W] __mutex_lock_common(system_transition_mutex:0)
[ 18.154496][ T1] [E] complete_all(&fw_st->completion:0)
[ 18.154848][ T1]
[ 18.154987][ T1] [S] (unknown)(&fw_st->completion:0):
[ 18.155310][ T1] (N/A)
[ 18.155473][ T1]
[ 18.155612][ T1] [W] __mutex_lock_common(system_transition_mutex:0):
[ 18.156014][ T1] [<ffffffff810ee14c>] __do_sys_reboot+0x11f/0x24f
[ 18.156400][ T1] stacktrace:
[ 18.156594][ T1] __mutex_lock+0x1ce/0x3f3
[ 18.156887][ T1] __do_sys_reboot+0x11f/0x24f
[ 18.157196][ T1] do_syscall_64+0xd4/0xfb
[ 18.157482][ T1] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 18.157856][ T1]
[ 18.157993][ T1] [E] complete_all(&fw_st->completion:0):
[ 18.158330][ T1] [<ffffffff81c04230>] kill_pending_fw_fallback_reqs+0x66/0x95
[ 18.158778][ T1] stacktrace:
[ 18.158973][ T1] complete_all+0x28/0x58
[ 18.159256][ T1] kill_pending_fw_fallback_reqs+0x66/0x95
[ 18.159624][ T1] fw_shutdown_notify+0x7/0xa
[ 18.159929][ T1] notifier_call_chain+0x4f/0x81
[ 18.160246][ T1] blocking_notifier_call_chain+0x4c/0x64
[ 18.160611][ T1] kernel_halt+0x13/0x5d
[ 18.160889][ T1] __do_sys_reboot+0x197/0x24f
[ 18.161197][ T1] do_syscall_64+0xd4/0xfb
[ 18.161485][ T1] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 18.161861][ T1] ---------------------------------------------------
[ 18.162260][ T1] context C's detail
[ 18.162490][ T1] ---------------------------------------------------
[ 18.162889][ T1] context C
[ 18.163073][ T1] [S] down_read(umhelper_sem:0)
[ 18.163379][ T1] [W] wait_for_completion_killable_timeout(&fw_st->completion:0)
[ 18.163855][ T1] [E] up_read(umhelper_sem:0)
[ 18.164150][ T1]
[ 18.164288][ T1] [S] down_read(umhelper_sem:0):
[ 18.164578][ T1] [<ffffffff810d8f99>] usermodehelper_read_lock_wait+0xad/0x139
[ 18.165027][ T1] stacktrace:
[ 18.165220][ T1] down_read+0x74/0x85
[ 18.165487][ T1] usermodehelper_read_lock_wait+0xad/0x139
[ 18.165860][ T1] firmware_fallback_sysfs+0x118/0x521
[ 18.166207][ T1] _request_firmware+0x7ef/0x91b
[ 18.166525][ T1] request_firmware_work_func+0xb1/0x13b
[ 18.166884][ T1] process_one_work+0x4c3/0x771
[ 18.167199][ T1] worker_thread+0x37f/0x49e
[ 18.167497][ T1] kthread+0x122/0x131
[ 18.167768][ T1] ret_from_fork+0x1f/0x30
[ 18.168055][ T1]
[ 18.168192][ T1] [W] wait_for_completion_killable_timeout(&fw_st->completion:0):
[ 18.168650][ T1] [<ffffffff81c046b7>] firmware_fallback_sysfs+0x42a/0x521
[ 18.169076][ T1] stacktrace:
[ 18.169270][ T1] wait_for_completion_killable_timeout+0x3c/0x58
[ 18.169676][ T1] firmware_fallback_sysfs+0x42a/0x521
[ 18.170024][ T1] _request_firmware+0x7ef/0x91b
[ 18.170341][ T1] request_firmware_work_func+0xb1/0x13b
[ 18.170699][ T1] process_one_work+0x4c3/0x771
[ 18.171012][ T1] worker_thread+0x37f/0x49e
[ 18.171309][ T1] kthread+0x122/0x131
[ 18.171575][ T1] ret_from_fork+0x1f/0x30
[ 18.171863][ T1]
[ 18.172001][ T1] [E] up_read(umhelper_sem:0):
[ 18.172281][ T1] (N/A)
[ 18.172445][ T1] ---------------------------------------------------
[ 18.172844][ T1] information that might be helpful
[ 18.173151][ T1] ---------------------------------------------------
[ 18.173549][ T1] CPU: 0 PID: 1 Comm: init Tainted: G B 5.18.0-rc3-57979-gc2b89afca919 #2374
[ 18.174144][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 18.174687][ T1] Call Trace:
[ 18.174882][ T1] <TASK>
[ 18.175056][ T1] print_circle+0x6a2/0x6f9
[ 18.175326][ T1] ? extend_queue+0xf4/0xf4
[ 18.175594][ T1] ? __list_add_valid+0xce/0xf6
[ 18.175886][ T1] ? __list_add+0x45/0x4e
[ 18.176144][ T1] ? print_circle+0x6f9/0x6f9
[ 18.176422][ T1] cb_check_dl+0xc0/0xc7
[ 18.176675][ T1] bfs+0x1c8/0x27b
[ 18.176897][ T1] ? disconnect_class+0x24c/0x24c
[ 18.177195][ T1] ? __list_add+0x45/0x4e
[ 18.177454][ T1] add_dep+0x140/0x217
[ 18.177697][ T1] ? add_ecxt+0x33b/0x33b
[ 18.177955][ T1] ? llist_del_first+0xc/0x46
[ 18.178234][ T1] add_wait+0x58a/0x723
[ 18.178482][ T1] ? __thaw_task+0x3e/0x3e
[ 18.178745][ T1] ? add_dep+0x217/0x217
[ 18.178998][ T1] ? llist_add_batch+0x33/0x4c
[ 18.179281][ T1] ? check_new_class+0x139/0x30d
[ 18.179574][ T1] __dept_wait+0x115/0x15b
[ 18.179837][ T1] ? __usermodehelper_disable+0x80/0x17f
[ 18.180170][ T1] ? add_wait+0x723/0x723
[ 18.180426][ T1] ? lock_release+0x338/0x338
[ 18.180704][ T1] ? __usermodehelper_disable+0x80/0x17f
[ 18.181037][ T1] dept_wait+0xcd/0xf3
[ 18.181280][ T1] down_write+0x4e/0x82
[ 18.181527][ T1] ? __usermodehelper_disable+0x80/0x17f
[ 18.181861][ T1] __usermodehelper_disable+0x80/0x17f
[ 18.182184][ T1] ? __usermodehelper_set_disable_depth+0x3a/0x3a
[ 18.182565][ T1] ? dept_ecxt_exit+0x1c9/0x1f7
[ 18.182854][ T1] ? blocking_notifier_call_chain+0x57/0x64
[ 18.183205][ T1] kernel_halt+0x33/0x5d
[ 18.183458][ T1] __do_sys_reboot+0x197/0x24f
[ 18.183742][ T1] ? kernel_power_off+0x6b/0x6b
[ 18.184033][ T1] ? dept_on+0x27/0x33
[ 18.184275][ T1] ? dept_exit+0x38/0x42
[ 18.184526][ T1] ? dept_on+0x27/0x33
[ 18.184767][ T1] ? dept_on+0x27/0x33
[ 18.185008][ T1] ? dept_exit+0x38/0x42
[ 18.185260][ T1] ? dept_enirq_transition+0x25a/0x295
[ 18.185582][ T1] ? syscall_enter_from_user_mode+0x47/0x71
[ 18.185930][ T1] ? dept_aware_softirqs_disable+0x1e/0x1e
[ 18.186274][ T1] ? syscall_enter_from_user_mode+0x47/0x71
[ 18.186622][ T1] do_syscall_64+0xd4/0xfb
[ 18.186883][ T1] ? asm_exc_page_fault+0x1e/0x30
[ 18.187180][ T1] ? dept_aware_softirqs_disable+0x1e/0x1e
[ 18.187526][ T1] ? dept_kernel_enter+0x15/0x1e
[ 18.187821][ T1] ? do_syscall_64+0x13/0xfb
[ 18.188094][ T1] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 18.188443][ T1] RIP: 0033:0x4485f7
[ 18.188672][ T1] Code: 00 75 05 48 83 c4 28 c3 e8 26 17 00 00 66 0f 1f 44 00 00 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8
[ 18.189822][ T1] RSP: 002b:00007ffc92206f28 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[ 18.190316][ T1] RAX: ffffffffffffffda RBX: 00007ffc92207118 RCX: 00000000004485f7
[ 18.190784][ T1] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead
[ 18.191254][ T1] RBP: 00007ffc92206f30 R08: 00000000016376a0 R09: 0000000000000000
[ 18.191722][ T1] R10: 00000000004c3820 R11: 0000000000000202 R12: 0000000000000001
[ 18.192194][ T1] R13: 00007ffc92207108 R14: 00000000004bf8d0 R15: 0000000000000001
[ 18.192667][ T1] </TASK>


2022-05-23 06:56:03

by Byungchul Park

[permalink] [raw]
Subject: Re: [REPORT] syscall reboot + umh + firmware fallback

On Thu, May 12, 2022 at 09:56:46AM -0400, Theodore Ts'o wrote:
> On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> > I have a question about this one. Yes, it would never been stuck thanks
> > to timeout. However, IIUC, timeouts are not supposed to expire in normal
> > cases. So I thought a timeout expiration means not a normal case so need
> > to inform it in terms of dependency so as to prevent further expiraton.
> > That's why I have been trying to track even timeout'ed APIs.
>
> As I beleive I've already pointed out to you previously in ext4 and
> ocfs2, the jbd2 timeout every five seconds happens **all** the time
> while the file system is mounted. Commits more frequently than five
> seconds is the exception case, at least for desktops/laptop workloads.

Thanks, Ted. It's easy to stop tracking APIs with timeout. I've been
just afraid that the cases that we want to suppress anyway will be
skipped.

However, I should stop it if it produces too many false alarms.

> We *don't* get to the timeout only when a userspace process calls
> fsync(2), or if the journal was incorrectly sized by the system
> administrator so that it's too small, and the workload has so many
> file system mutations that we have to prematurely close the
> transaction ahead of the 5 second timeout.

Yeah... It's how journaling works. Thanks.

> > Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> > shouldn't track the timeout APIs any more.
>
> DEPT tracking timeouts will cause false positives in at least some
> cases. At the very least, there needs to be an easy way to suppress
> these false positives on a per wait/mutex/spinlock basis.

The easy way is to stop tracking those that are along with timeout until
DEPT starts to consider waits/events by timeout functionality itself.

Thanks.

Byungchul
>
> - Ted