Hello,
Just took a look out of curiosity.
On Thu, May 12, 2022 at 02:25:57PM +0900, Byungchul Park wrote:
> PROCESS A PROCESS B WORKER C
>
> __do_sys_reboot()
> __do_sys_reboot()
> mutex_lock(&system_transition_mutex)
> ... mutex_lock(&system_transition_mutex) <- stuck
> ...
> request_firmware_work_func()
> _request_firmware()
> firmware_fallback_sysfs()
> usermodehelper_read_lock_wait()
> down_read(&umhelper_sem)
> ...
> fw_load_sysfs_fallback()
> fw_sysfs_wait_timeout()
> wait_for_completion_killable_timeout(&fw_st->completion) <- stuck
> kernel_halt()
> __usermodehelper_disable()
> down_write(&umhelper_sem) <- stuck
>
> --------------------------------------------------------
> All the 3 contexts are stuck at this point.
> --------------------------------------------------------
>
> PROCESS A PROCESS B WORKER C
>
> ...
> up_write(&umhelper_sem)
> ...
> mutex_unlock(&system_transition_mutex) <- cannot wake up B
>
> ...
> kernel_halt()
> notifier_call_chain()
> hw_shutdown_notify()
> kill_pending_fw_fallback_reqs()
> __fw_load_abort()
> complete_all(&fw_st->completion) <- cannot wake up C
>
> ...
> usermodeheler_read_unlock()
> up_read(&umhelper_sem) <- cannot wake up A
I'm not sure I'm reading it correctly but it looks like "process B" column
is superflous given that it's waiting on the same lock to do the same thing
that A is already doing (besides, you can't really halt the machine twice).
What it's reporting seems to be ABBA deadlock between A waiting on
umhelper_sem and C waiting on fw_st->completion. The report seems spurious:
1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
to make forward progress because it will unstick itself after timeout
expires.
2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
The fw loader can be, and mainly should be, woken up by firmware loading
actually completing instead of being aborted.
I guess the reason why B shows up there is because the operation order is
such that just between A and C, the complete_all() takes place before
__usermodehlper_disable(), so the whole thing kinda doesn't make sense as
you can't block a past operation by a future one. Inserting process B
introduces the reverse ordering.
Thanks.
--
tejun
Tejun wrote:
> Hello,
Hello,
> I'm not sure I'm reading it correctly but it looks like "process B" column
I think you're interpreting the report correctly.
> is superflous given that it's waiting on the same lock to do the same thing
> that A is already doing (besides, you can't really halt the machine twice).
Indeed! I've been in a daze. I thought kernel_halt() can be called twice
by two different purposes. Sorry for the noise.
> What it's reporting seems to be ABBA deadlock between A waiting on
> umhelper_sem and C waiting on fw_st->completion. The report seems spurious:
>
> 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
> to make forward progress because it will unstick itself after timeout
> expires.
I have a question about this one. Yes, it would never been stuck thanks
to timeout. However, IIUC, timeouts are not supposed to expire in normal
cases. So I thought a timeout expiration means not a normal case so need
to inform it in terms of dependency so as to prevent further expiraton.
That's why I have been trying to track even timeout'ed APIs.
Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
shouldn't track the timeout APIs any more.
> 2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
> The fw loader can be, and mainly should be, woken up by firmware loading
> actually completing instead of being aborted.
This is the point I'd like to ask. In normal cases, fw_load_done() might
happen, of course, if the loading gets completed. However, I was
wondering if the kernel ensures either fw_load_done() or fw_load_abort()
to be called by *another* context while kernel_halt().
> Thanks.
Thank you very much!
Byungchul
>
> --
> tejun
>
Hello,
On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> > 1. wait_for_completion_killable_timeout() doesn't need someone to wake it up
> > to make forward progress because it will unstick itself after timeout
> > expires.
>
> I have a question about this one. Yes, it would never been stuck thanks
> to timeout. However, IIUC, timeouts are not supposed to expire in normal
> cases. So I thought a timeout expiration means not a normal case so need
> to inform it in terms of dependency so as to prevent further expiraton.
> That's why I have been trying to track even timeout'ed APIs.
>
> Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> shouldn't track the timeout APIs any more.
Without actually surveying the use cases, I can't say for sure but my
experience has been that we often get pretty creative with timeouts and it's
something people actively think about and monitor (and it's usually not
subtle). Given that, I'm skeptical about how much value it'd add for a
dependency checker to warn about timeouts. It might be net negative than the
other way around.
> > 2. complete_all() from __fw_load_abort() isn't the only source of wakeup.
> > The fw loader can be, and mainly should be, woken up by firmware loading
> > actually completing instead of being aborted.
>
> This is the point I'd like to ask. In normal cases, fw_load_done() might
> happen, of course, if the loading gets completed. However, I was
> wondering if the kernel ensures either fw_load_done() or fw_load_abort()
> to be called by *another* context while kernel_halt().
We'll have to walk through the code to tell that. On a cursory look tho, up
until that point (just before shutting down usermode helper), I don't see
anything which would actively block firmware loading.
Thanks.
--
tejun
On Thu, May 12, 2022 at 08:18:24PM +0900, Byungchul Park wrote:
> I have a question about this one. Yes, it would never been stuck thanks
> to timeout. However, IIUC, timeouts are not supposed to expire in normal
> cases. So I thought a timeout expiration means not a normal case so need
> to inform it in terms of dependency so as to prevent further expiraton.
> That's why I have been trying to track even timeout'ed APIs.
As I beleive I've already pointed out to you previously in ext4 and
ocfs2, the jbd2 timeout every five seconds happens **all** the time
while the file system is mounted. Commits more frequently than five
seconds is the exception case, at least for desktops/laptop workloads.
We *don't* get to the timeout only when a userspace process calls
fsync(2), or if the journal was incorrectly sized by the system
administrator so that it's too small, and the workload has so many
file system mutations that we have to prematurely close the
transaction ahead of the 5 second timeout.
> Do you think DEPT shouldn't track timeout APIs? If I was wrong, I
> shouldn't track the timeout APIs any more.
DEPT tracking timeouts will cause false positives in at least some
cases. At the very least, there needs to be an easy way to suppress
these false positives on a per wait/mutex/spinlock basis.
- Ted