2020-07-15 13:08:09

by Brent Lu

[permalink] [raw]
Subject: [PATCH] ASoC: hdac_hdmi: remove cancel_work_sync in runtime suspend

A deadlock is identified when there are three contexts running at the
same time:
- a HDMI jack work which is calling snd_soc_dapm_sync().
- user space is calling snd_pcm_release() to close pcm device.
- pm is calling runtime suspend function of HDMI codec driver.

By removing the clear_dapm_works() invocation in the
hdac_hdmi_runtime_suspend() function, the snd_pcm_release() could
always returns from dapm_power_widgets() function call without
blocking the hdac_hdmi_jack_dapm_work() work thread or being blocked
by the hdac_hdmi_runtime_suspend() function. The purpose of the jack
work is to enable/disable the dapm jack pin so it's not necessary to
cancel the work in runtime suspend function which is usually called
when pcm device is closed.

Signed-off-by: Brent Lu <[email protected]>
---
sound/soc/codecs/hdac_hdmi.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/sound/soc/codecs/hdac_hdmi.c b/sound/soc/codecs/hdac_hdmi.c
index f26b77f..9052edd 100644
--- a/sound/soc/codecs/hdac_hdmi.c
+++ b/sound/soc/codecs/hdac_hdmi.c
@@ -2097,8 +2097,6 @@ static int hdac_hdmi_runtime_suspend(struct device *dev)
if (!bus)
return 0;

- clear_dapm_works(hdev);
-
/*
* Power down afg.
* codec_read is preferred over codec_write to set the power state.
--
2.7.4


2020-07-15 13:22:58

by Brent Lu

[permalink] [raw]
Subject: RE: [PATCH] ASoC: hdac_hdmi: remove cancel_work_sync in runtime suspend

> A deadlock is identified when there are three contexts running at the same
> time:
> - a HDMI jack work which is calling snd_soc_dapm_sync().
> - user space is calling snd_pcm_release() to close pcm device.
> - pm is calling runtime suspend function of HDMI codec driver.
>
> By removing the clear_dapm_works() invocation in the
> hdac_hdmi_runtime_suspend() function, the snd_pcm_release() could
> always returns from dapm_power_widgets() function call without blocking
> the hdac_hdmi_jack_dapm_work() work thread or being blocked by the
> hdac_hdmi_runtime_suspend() function. The purpose of the jack work is to
> enable/disable the dapm jack pin so it's not necessary to cancel the work in
> runtime suspend function which is usually called when pcm device is closed.
>
> Signed-off-by: Brent Lu <[email protected]>
> ---

It is found on CML Chromebox 'Puff' which is using Chrome-v4.19 branch. Kernel
resets itself because tasks getting stuck in mutex for 2 mins. Following is the
reproduce steps:

1. Connect to external HDMI display.
2. Open web page "https://bitmovin.com/demos/drm"
3. Reload the web page multiple times until device reboot

snd_pcm_release()
- In dapm_power_widgets() waitng for dapm_pre_sequence_async() to
complete.
- Holding the card->dapm_mutex when calling
snd_soc_dapm_stream_event() function.
- Call trace:
[ 1352.228063] __schedule+0x4b8/0xf5c
[ 1352.228074] schedule+0x3f/0x7a
[ 1352.228083] async_synchronize_cookie_domain+0xb5/0x11c
[ 1352.228091] ? init_wait_entry+0x2e/0x2e
[ 1352.228101] dapm_power_widgets+0x1eb/0x3b5
[ 1352.228111] snd_soc_dapm_stream_event+0x80/0x92
[ 1352.228121] dpcm_dapm_stream_event+0x58/0x7c
[ 1352.228130] dpcm_fe_dai_close+0xa3/0x118
[ 1352.228141] snd_pcm_release_substream+0x72/0xbc
[ 1352.228148] snd_pcm_release+0x3e/0x9b
[ 1352.228156] __fput+0xfa/0x1ef
[ 1352.228166] task_work_run+0x7f/0xa7
[ 1352.228174] exit_to_usermode_loop+0xa5/0xa7
[ 1352.228182] do_syscall_64+0xfb/0x14b
[ 1352.228190] entry_SYSCALL_64_after_hwframe+0x44/0xa9

dapm_pre_sequence_async()
- In rpm_resume() waiting for hdac_hdmi_runtime_suspend() to complete.
- Call trace:
[ 1352.227903] Workqueue: events_unbound async_run_entry_fn
[ 1352.227908] Call Trace:
[ 1352.227917] __schedule+0x4b8/0xf5c
[ 1352.227926] schedule+0x3f/0x7a
[ 1352.227934] rpm_resume+0x16e/0x5ee
[ 1352.227944] ? init_wait_entry+0x2e/0x2e
[ 1352.227951] __pm_runtime_resume+0x6b/0x83
[ 1352.227959] dapm_pre_sequence_async+0x2c/0x8f
[ 1352.227967] async_run_entry_fn+0x3d/0xd1
[ 1352.227975] process_one_work+0x19b/0x4c9
[ 1352.227983] worker_thread+0x10d/0x284
[ 1352.227991] kthread+0x138/0x140
[ 1352.227998] ? process_one_work+0x4c9/0x4c9
[ 1352.228006] ? kthread_associate_blkcg+0xbe/0xbe
[ 1352.228012] ret_from_fork+0x1f/0x40

hdac_hdmi_runtime_suspend()
- In clear_dapm_works() waiting for hdac_hdmi_jack_dapm_work() to
complete.
- Call trace:
[ 1352.228359] Workqueue: pm pm_runtime_work
[ 1352.228379] Call Trace:
[ 1352.228396] __schedule+0x4b8/0xf5c
[ 1352.228410] ? delete_node+0x74/0x1ab
[ 1352.228426] schedule+0x3f/0x7a
[ 1352.228442] schedule_timeout+0x11f/0x176
[ 1352.228461] do_wait_for_common+0x118/0x164
[ 1352.228477] ? console_conditional_schedule+0x28/0x28
[ 1352.228494] ? queue_core_balance+0x75/0x75
[ 1352.228509] wait_for_completion+0x4c/0x58
[ 1352.228526] __flush_work+0x112/0x1a1
[ 1352.228540] ? insert_wq_barrier+0x118/0x118
[ 1352.228556] __cancel_work_timer+0x9f/0x1ad
[ 1352.228569] ? __switch_to_asm+0x41/0x70
[ 1352.228582] ? __switch_to_asm+0x35/0x70
[ 1352.228596] ? __switch_to_asm+0x41/0x70
[ 1352.228613] clear_dapm_works+0x3f/0x60 [snd_soc_hdac_hdmi]
[ 1352.228627] ? hdmi_codec_resume+0x40/0x40 [snd_soc_hdac_hdmi]
[ 1352.228642] hdac_hdmi_runtime_suspend+0x20/0xac [snd_soc_hdac_hdmi]
[ 1352.228654] __rpm_callback+0x8a/0xf3
[ 1352.228669] ? hdmi_codec_resume+0x40/0x40 [snd_soc_hdac_hdmi]
[ 1352.228680] rpm_suspend+0x3b4/0x73e
[ 1352.228696] pm_runtime_work+0x5d/0xa0
[ 1352.228708] process_one_work+0x19b/0x4c9
[ 1352.228717] worker_thread+0x10d/0x284
[ 1352.228727] kthread+0x138/0x140
[ 1352.228736] ? process_one_work+0x4c9/0x4c9
[ 1352.228746] ? kthread_associate_blkcg+0xbe/0xbe
[ 1352.228761] ret_from_fork+0x1f/0x40

hdac_hdmi_jack_dapm_work()
- In snd_soc_dapm_sync() waiting for waiting for card->dapm_mutex to
be released.
- The mutex is held by snd_pcm_release(), deadlock here.
- Call trace:
[ 1352.228835] Workqueue: events hdac_hdmi_jack_dapm_work [snd_soc_hdac_hdmi]
[ 1352.228845] Call Trace:
[ 1352.228860] __schedule+0x4b8/0xf5c
[ 1352.228878] ? mutex_spin_on_owner+0x86/0x94
[ 1352.228896] schedule_preempt_disabled+0x47/0x8a
[ 1352.228911] __mutex_lock_common+0x25c/0x52c
[ 1352.228929] __mutex_lock_slowpath+0x20/0x26
[ 1352.228946] snd_soc_dapm_sync+0x1b/0x4d
[ 1352.228960] process_one_work+0x19b/0x4c9
[ 1352.228976] worker_thread+0x1e4/0x284
[ 1352.228994] kthread+0x138/0x140
[ 1352.229007] ? process_one_work+0x4c9/0x4c9
[ 1352.229021] ? kthread_associate_blkcg+0xbe/0xbe
[ 1352.229036] ret_from_fork+0x1f/0x40


Regards,
Brent

> sound/soc/codecs/hdac_hdmi.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/sound/soc/codecs/hdac_hdmi.c
> b/sound/soc/codecs/hdac_hdmi.c index f26b77f..9052edd 100644
> --- a/sound/soc/codecs/hdac_hdmi.c
> +++ b/sound/soc/codecs/hdac_hdmi.c
> @@ -2097,8 +2097,6 @@ static int hdac_hdmi_runtime_suspend(struct
> device *dev)
> if (!bus)
> return 0;
>
> - clear_dapm_works(hdev);
> -
> /*
> * Power down afg.
> * codec_read is preferred over codec_write to set the power state.
> --
> 2.7.4

2020-10-02 21:08:54

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] ASoC: hdac_hdmi: remove cancel_work_sync in runtime suspend

On Wed, 15 Jul 2020 21:01:50 +0800, Brent Lu wrote:
> A deadlock is identified when there are three contexts running at the
> same time:
> - a HDMI jack work which is calling snd_soc_dapm_sync().
> - user space is calling snd_pcm_release() to close pcm device.
> - pm is calling runtime suspend function of HDMI codec driver.
>
> By removing the clear_dapm_works() invocation in the
> hdac_hdmi_runtime_suspend() function, the snd_pcm_release() could
> always returns from dapm_power_widgets() function call without
> blocking the hdac_hdmi_jack_dapm_work() work thread or being blocked
> by the hdac_hdmi_runtime_suspend() function. The purpose of the jack
> work is to enable/disable the dapm jack pin so it's not necessary to
> cancel the work in runtime suspend function which is usually called
> when pcm device is closed.

Applied to

https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: hdac_hdmi: remove cancel_work_sync in runtime suspend
commit: cec6e41ce094affad3b5f9f5e5aa1f81c66ce682

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark