2019-02-05 21:17:18

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [BUG BISECT] NULL pointer after commit "ASoC: dapm: Only power up active channels from a DAI"

Hi,

Bisect pointed to commit:
commit 078a85f2806f0ffd11289009462a6a390f9adb5c
Author: Charles Keepax <[email protected]>
Date: Thu Jan 31 13:30:18 2019 +0000
ASoC: dapm: Only power up active channels from a DAI

as a bad commit for NULL pointer on my Odroid XU3 and Odroid U3 board when
doing "aplay /usr/share/sounds/alsa/Front_Right.wav".

1. Arch ARM Linux
2. exynos_defconfig
3. Odroid U3, XU3, Exynos SoC, ARMv7

Last address is in calltrace:
c079552c
dapm_update_dai_unlocked
sound/soc/soc-dapm.c:2586

Full log:

Unable to handle kernel NULL pointer dereference at virtual address 0000007c
pgd = d5b32ef7
[0000007c] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in: s5p_mfc s5p_jpeg exynos_gsc v4l2_mem2mem v4l2_common videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media s5p_cec
CPU: 7 PID: 359 Comm: aplay Not tainted 5.0.0-rc4-00177-g078a85f2806f #32
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
PC is at dapm_update_dai_unlocked+0x38/0x138
LR is at snd_soc_dapm_update_dai+0x34/0x4c
pc : [<c079552c>] lr : [<c0795660>] psr: 60010013
sp : eb9d9be8 ip : eabc8b64 fp : ecce0e10
r10: c100b448 r9 : ecce0e0c r8 : 00000000
r7 : 00000000 r6 : 00000000 r5 : 00000002 r4 : ecce0800
r3 : 00000000 r2 : eccaa080 r1 : ecdab000 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 6c09806a DAC: 00000051
Process aplay (pid: 359, stack limit = 0x4ae0a814)
Stack: (0xeb9d9be8 to 0xeb9da000)
9be0: ecce0e0c c100b448 ecce0e10 ecce0800 ecd64400 ecdab000
9c00: eccaa080 c0795660 ee3c480c ecd64400 ecce0800 ecdab000 00000000 c079d760
9c20: eccaa080 ecce080c 00000000 00000001 00000000 00000000 00000000 00000000
9c40: 00000000 00000000 00000000 00000004 00000000 00000000 00000000 00000000
9c60: 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000000
9c80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9ca0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9cc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9ce0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9d00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9d20: 00000000 00000000 00000000 00000010 00000010 00000004 00000020 00000020
9d40: 00000004 00000002 00000002 00000004 0000bb80 0000bb80 00000004 0001e848
9d60: 0001e848 00000004 00001770 00001770 00000004 00005dc0 00005dc0 00000004
9d80: 00000004 00000004 00000004 0007a120 0007a120 00000004 00005dc0 00005dc0
9da0: 00000004 00017700 00017700 00000004 00000000 00000000 00000004 00000000
9dc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9de0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9e00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9e20: 00000000 00000000 00000000 000fff07 00200103 00000010 0000bb80 00000001
9e40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9e80: 00000000 8c6972bb ecdab000 ecd64400 ecdab000 00000000 c0b97b1c ebfe8000
9ea0: 00000000 bea4eea0 b6effb88 c07825e4 c100b448 ecdab000 ecd64400 00000000
9ec0: ecfdd180 ec1bde40 bea4eea0 c07836e0 eb9d9f78 00000000 00000000 00000001
9ee0: 61746164 c0294d70 00000001 8c6972bb c100b448 bea4eea0 ecfdd180 c02a9bac
9f00: bea4eea0 eb9d8000 ed4aafa8 c02a9244 eb9cbb80 c100b448 00000000 00000000
9f20: 00000000 00000000 00000000 8c6972bb eb9d9fb0 ec6f11c0 eb9cbb80 00000002
9f40: eb9cbb88 00000001 b6e85b6f 00000001 61746164 c0295028 00000000 00000000
9f60: c100b448 8c6972bb 00000000 ecfdd180 ecfdd180 00000004 c25c4111 bea4eea0
9f80: eb9d8000 00000036 b6effb88 c02a9bac 004cf1b8 bea4eea0 004cf208 00000036
9fa0: c01011c4 c0101000 004cf1b8 bea4eea0 00000004 c25c4111 bea4eea0 00020001
9fc0: 004cf1b8 bea4eea0 004cf208 00000036 bea4ee10 00000000 b6fb5900 b6effb88
9fe0: b6f6c82c bea4edf4 b6efbfc4 b6e0bbdc 200d0010 00000004 00000000 00000000
[<c079552c>] (dapm_update_dai_unlocked) from [<c0795660>] (snd_soc_dapm_update_dai+0x34/0x4c)
[<c0795660>] (snd_soc_dapm_update_dai) from [<c079d760>] (soc_pcm_hw_params+0x22c/0x624)
[<c079d760>] (soc_pcm_hw_params) from [<c07825e4>] (snd_pcm_hw_params+0x128/0x378)
[<c07825e4>] (snd_pcm_hw_params) from [<c07836e0>] (snd_pcm_ioctl+0x26c/0x1258)
[<c07836e0>] (snd_pcm_ioctl) from [<c02a9244>] (do_vfs_ioctl+0xb0/0x9e4)
[<c02a9244>] (do_vfs_ioctl) from [<c02a9bac>] (ksys_ioctl+0x34/0x58)
[<c02a9bac>] (ksys_ioctl) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xeb9d9fa8 to 0xeb9d9ff0)
9fa0: 004cf1b8 bea4eea0 00000004 c25c4111 bea4eea0 00020001
9fc0: 004cf1b8 bea4eea0 004cf208 00000036 bea4ee10 00000000 b6fb5900 b6effb88
9fe0: b6f6c82c bea4edf4 b6efbfc4 b6e0bbdc
Code: e3530000 e5926020 1a00003a e1a07006 (e5b7407c)
---[ end trace 8561600efcd2b2f1 ]---

Let me know if you need any other data.


Best regards,
Krzysztof



2019-02-06 10:01:30

by Sylwester Nawrocki

[permalink] [raw]
Subject: Re: [BUG BISECT] NULL pointer after commit "ASoC: dapm: Only power up active channels from a DAI"

Hi Krzysztof,

Cc: Charles

On 2/5/19 22:16, Krzysztof Kozlowski wrote:
> Bisect pointed to commit:
> commit 078a85f2806f0ffd11289009462a6a390f9adb5c
> Author: Charles Keepax <[email protected]>
> Date: Thu Jan 31 13:30:18 2019 +0000
> ASoC: dapm: Only power up active channels from a DAI
>
> as a bad commit for NULL pointer on my Odroid XU3 and Odroid U3 board when
> doing "aplay /usr/share/sounds/alsa/Front_Right.wav".
>
> 1. Arch ARM Linux
> 2. exynos_defconfig
> 3. Odroid U3, XU3, Exynos SoC, ARMv7
>
> Last address is in calltrace:
> c079552c
> dapm_update_dai_unlocked
> sound/soc/soc-dapm.c:2586

Thanks for bisecting this, I ran into same issue last night and I'm starting
to debug this now. I have added some debug prints and it looks like it oopses
on NULL playback_widget of the dummy DAI.

[ 30.701182] hdmi-audio-codec hdmi-audio-codec.0.auto: Update DAI routes for i2s-hifi playback
[ 30.709630] dapm_update_dai_unlocked:2586 w=8bd27d28
[ 30.714403] dapm_update_dai_unlocked:2594 w=8bd27d28
[ 30.724688] max98090 5-0010: Update DAI routes for HiFi playback
[ 30.730163] dapm_update_dai_unlocked:2586 w=3fc942af
[ 30.735154] dapm_update_dai_unlocked:2594 w=3fc942af
[ 30.745051] snd-soc-dummy snd-soc-dummy: Update DAI routes for snd-soc-dummy-dai playback
[ 30.753128] dapm_update_dai_unlocked:2586 w= (null)
[ 30.758114] Unable to handle kernel NULL pointer dereference at virtual address 0000007c

--
Regards,
Sylwester

2019-02-06 10:08:23

by Charles Keepax

[permalink] [raw]
Subject: Re: [BUG BISECT] NULL pointer after commit "ASoC: dapm: Only power up active channels from a DAI"

On Tue, Feb 05, 2019 at 10:16:22PM +0100, Krzysztof Kozlowski wrote:
> Bisect pointed to commit:
> commit 078a85f2806f0ffd11289009462a6a390f9adb5c
> Author: Charles Keepax <[email protected]>
> Date: Thu Jan 31 13:30:18 2019 +0000
> ASoC: dapm: Only power up active channels from a DAI
>
> as a bad commit for NULL pointer on my Odroid XU3 and Odroid U3 board when
> doing "aplay /usr/share/sounds/alsa/Front_Right.wav".
>
> 1. Arch ARM Linux
> 2. exynos_defconfig
> 3. Odroid U3, XU3, Exynos SoC, ARMv7
>
> Last address is in calltrace:
> c079552c
> dapm_update_dai_unlocked
> sound/soc/soc-dapm.c:2586
>

Apologies for this one, is a little strange. I am assuming either
we end up with a NULL widget or more likely given the line
number a NULL path/widget on the path. Having some difficulty seeing
how we get into that state though.

Could you confirm what machine driver/DT is being used here? I am
assuming this is arch/arm/boot/dts/exynos4412-odroidu3.dts and
sound/soc/samsung/odroid.c. So will investigate those see if I
can find anything.

And could you try this diff and send through the output:

diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index 482ddb825fb59..9ee44b0192874 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -2570,16 +2570,37 @@ static int dapm_update_dai_unlocked(struct snd_pcm_substream *substream,
else
w = dai->capture_widget;

+ if (!w) {
+ pr_err("CK -- NULL widget on %s\n", dai->name);
+ return 0;
+ }
+
dev_dbg(dai->dev, "Update DAI routes for %s %s\n", dai->name,
dir == SNDRV_PCM_STREAM_PLAYBACK ? "playback" : "capture");

snd_soc_dapm_widget_for_each_sink_path(w, p) {
+ if (!p) {
+ pr_err("DEBUG -- NULL sink path on %s\n", w->name);
+ continue;
+ }
+ if (!p->sink) {
+ pr_err("DEBUG -- NULL wink widget on %s\n", w->name);
+ continue;
+ }
ret = dapm_update_dai_chan(p, p->sink, channels);
if (ret < 0)
return ret;
}

snd_soc_dapm_widget_for_each_source_path(w, p) {
+ if (!p) {
+ pr_err("DEBUG -- NULL source path on %s\n", w->name);
+ continue;
+ }
+ if (!p->source) {
+ pr_err("DEBUG -- NULL source widget on %s\n", w->name);
+ continue;
+ }
ret = dapm_update_dai_chan(p, p->source, channels);
if (ret < 0)
return ret;

Thanks,
Charles

2019-02-06 10:11:06

by Charles Keepax

[permalink] [raw]
Subject: [PATCH] ASoC: dapm: Check for NULL widget in dapm_update_dai_unlocked

DAIs linked to the dummy will not have an associated playback/capture
widget, so we need to skip the update in that case.

Fixes: 078a85f2806f ("ASoC: dapm: Only power up active channels from a DAI")
Signed-off-by: Charles Keepax <[email protected]>
---

Ok so that all makes sense, this patch is probably the best fix?

Thanks,
Charles

sound/soc/soc-dapm.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index 482ddb825fb59..5235d8828758a 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -2570,6 +2570,9 @@ static int dapm_update_dai_unlocked(struct snd_pcm_substream *substream,
else
w = dai->capture_widget;

+ if (!w)
+ return 0;
+
dev_dbg(dai->dev, "Update DAI routes for %s %s\n", dai->name,
dir == SNDRV_PCM_STREAM_PLAYBACK ? "playback" : "capture");

--
2.11.0


2019-02-06 10:14:03

by Sylwester Nawrocki

[permalink] [raw]
Subject: Re: [BUG BISECT] NULL pointer after commit "ASoC: dapm: Only power up active channels from a DAI"

On 2/6/19 10:46, Sylwester Nawrocki wrote:
> On 2/5/19 22:16, Krzysztof Kozlowski wrote:
>> Bisect pointed to commit:
>> commit 078a85f2806f0ffd11289009462a6a390f9adb5c
>> Author: Charles Keepax <[email protected]>
>> Date: Thu Jan 31 13:30:18 2019 +0000
>> ASoC: dapm: Only power up active channels from a DAI
>>
>> as a bad commit for NULL pointer on my Odroid XU3 and Odroid U3 board when
>> doing "aplay /usr/share/sounds/alsa/Front_Right.wav".
>>
>> 1. Arch ARM Linux
>> 2. exynos_defconfig
>> 3. Odroid U3, XU3, Exynos SoC, ARMv7
>>
>> Last address is in calltrace:
>> c079552c
>> dapm_update_dai_unlocked
>> sound/soc/soc-dapm.c:2586
>
> Thanks for bisecting this, I ran into same issue last night and I'm starting
> to debug this now. I have added some debug prints and it looks like it oopses
> on NULL playback_widget of the dummy DAI.
>
> [ 30.701182] hdmi-audio-codec hdmi-audio-codec.0.auto: Update DAI routes for i2s-hifi playback
> [ 30.709630] dapm_update_dai_unlocked:2586 w=8bd27d28
> [ 30.714403] dapm_update_dai_unlocked:2594 w=8bd27d28
> [ 30.724688] max98090 5-0010: Update DAI routes for HiFi playback
> [ 30.730163] dapm_update_dai_unlocked:2586 w=3fc942af
> [ 30.735154] dapm_update_dai_unlocked:2594 w=3fc942af
> [ 30.745051] snd-soc-dummy snd-soc-dummy: Update DAI routes for snd-soc-dummy-dai playback
> [ 30.753128] dapm_update_dai_unlocked:2586 w= (null)
> [ 30.758114] Unable to handle kernel NULL pointer dereference at virtual address 0000007c

With a change as below there is no oops and everything works again,
but I'm not sure this is a proper fix.

diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index 5b74dffc9c11..111a23a9708a 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -2580,6 +2580,9 @@ static int dapm_update_dai_unlocked(struct snd_pcm_substream *substream,
else
w = dai->capture_widget;

+ if (!w)
+ return 0;
+
dev_dbg(dai->dev, "Update DAI routes for %s %s\n", dai->name,
dir == SNDRV_PCM_STREAM_PLAYBACK ? "playback" : "capture");

--
Thanks,
Sylwester

2019-02-06 10:19:15

by Sylwester Nawrocki

[permalink] [raw]
Subject: Re: [PATCH] ASoC: dapm: Check for NULL widget in dapm_update_dai_unlocked

On 2/6/19 11:05, Charles Keepax wrote:
> DAIs linked to the dummy will not have an associated playback/capture
> widget, so we need to skip the update in that case.
>
> Fixes: 078a85f2806f ("ASoC: dapm: Only power up active channels from a DAI")
> Signed-off-by: Charles Keepax <[email protected]>

Tested-by: Sylwester Nawrocki <[email protected]>

> ---
>
> Ok so that all makes sense, this patch is probably the best fix?

It seems so, everything works well with such change, thank you.

> sound/soc/soc-dapm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
> index 482ddb825fb59..5235d8828758a 100644
> --- a/sound/soc/soc-dapm.c
> +++ b/sound/soc/soc-dapm.c
> @@ -2570,6 +2570,9 @@ static int dapm_update_dai_unlocked(struct snd_pcm_substream *substream,
> else
> w = dai->capture_widget;
>
> + if (!w)
> + return 0;
> +
> dev_dbg(dai->dev, "Update DAI routes for %s %s\n", dai->name,
> dir == SNDRV_PCM_STREAM_PLAYBACK ? "playback" : "capture");
>

2019-02-06 10:27:21

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH] ASoC: dapm: Check for NULL widget in dapm_update_dai_unlocked

On Wed, 6 Feb 2019 at 11:05, Charles Keepax
<[email protected]> wrote:
>
> DAIs linked to the dummy will not have an associated playback/capture
> widget, so we need to skip the update in that case.
>
> Fixes: 078a85f2806f ("ASoC: dapm: Only power up active channels from a DAI")
> Signed-off-by: Charles Keepax <[email protected]>
> ---
>
> Ok so that all makes sense, this patch is probably the best fix?
>
> Thanks,
> Charles

For this particular issue (NULL-pointer):
Reported-by: Krzysztof Kozlowski <[email protected]>
Tested-by: Krzysztof Kozlowski <[email protected]>

However now I see bug sleeping in atomic context:

[ 64.000828] BUG: sleeping function called from invalid context at
../kernel/locking/mutex.c:908
[ 64.008483] in_atomic(): 1, irqs_disabled(): 128, pid: 353, name: aplay
[ 64.014978] 2 locks held by aplay/353:
[ 64.018671] #0: 1fb9b63d (&(&group->lock)->rlock){....}, at:
snd_pcm_action_lock_irq+0x18/0x3c
[ 64.027337] #1: 8b42bfe8 (&(&pri_dai->spinlock)->rlock){....}, at:
i2s_trigger+0x130/0x6c8
[ 64.035654] irq event stamp: 8754
[ 64.038925] hardirqs last enabled at (8753): [<c0a78758>]
_raw_spin_unlock_irq+0x24/0x5c
[ 64.047094] hardirqs last disabled at (8754): [<c0a785a0>]
_raw_spin_lock_irq+0x18/0x50
[ 64.055068] softirqs last enabled at (8096): [<c0102698>]
__do_softirq+0x4f0/0x5e4
[ 64.062680] softirqs last disabled at (8083): [<c012ee94>]
irq_exit+0x160/0x16c
[ 64.069953] Preemption disabled at:
[ 64.069956] [<00000000>] (null)
[ 64.076700] CPU: 6 PID: 353 Comm: aplay Not tainted
5.0.0-rc5-next-20190206-00001-g101ffa564f78 #348
[ 64.085822] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 64.091882] [<c0112318>] (unwind_backtrace) from [<c010df2c>]
(show_stack+0x10/0x14)
[ 64.099601] [<c010df2c>] (show_stack) from [<c0a5626c>]
(dump_stack+0x98/0xc4)
[ 64.106788] [<c0a5626c>] (dump_stack) from [<c0156ac8>]
(___might_sleep+0x264/0x2cc)
[ 64.114501] [<c0156ac8>] (___might_sleep) from [<c0a732b4>]
(__mutex_lock+0x40/0xa98)
[ 64.122300] [<c0a732b4>] (__mutex_lock) from [<c0a73d28>]
(mutex_lock_nested+0x1c/0x24)
[ 64.130275] [<c0a73d28>] (mutex_lock_nested) from [<c04c2838>]
(clk_prepare_lock+0x50/0xf8)
[ 64.138596] [<c04c2838>] (clk_prepare_lock) from [<c04c5f48>]
(clk_core_get_rate+0xc/0x60)
[ 64.146824] [<c04c5f48>] (clk_core_get_rate) from [<c07b419c>]
(i2s_trigger+0x444/0x6c8)
[ 64.154883] [<c07b419c>] (i2s_trigger) from [<c079cba0>]
(soc_pcm_trigger+0x100/0x140)
[ 64.162768] [<c079cba0>] (soc_pcm_trigger) from [<c07839c0>]
(snd_pcm_action_single+0x38/0x80)
[ 64.171349] [<c07839c0>] (snd_pcm_action_single) from [<c0783a5c>]
(snd_pcm_action+0x54/0x5c)
[ 64.179841] [<c0783a5c>] (snd_pcm_action) from [<c0783bd4>]
(snd_pcm_action_lock_irq+0x28/0x3c)
[ 64.188508] [<c0783bd4>] (snd_pcm_action_lock_irq) from
[<c07860b0>] (snd_pcm_ioctl+0x920/0x123c)
[ 64.197350] [<c07860b0>] (snd_pcm_ioctl) from [<c02aa6d4>]
(do_vfs_ioctl+0xb0/0x9f8)
[ 64.205054] [<c02aa6d4>] (do_vfs_ioctl) from [<c02ab050>]
(ksys_ioctl+0x34/0x5c)
[ 64.212418] [<c02ab050>] (ksys_ioctl) from [<c0101000>]
(ret_fast_syscall+0x0/0x28)
[ 64.220045] Exception stack(0xe69dffa8 to 0xe69dfff0)
[ 64.225058] ffa0: 004391b8 00001770 00000004
00004142 00439340 00439340
[ 64.233218] ffc0: 004391b8 00001770 00001770 00000036 00001770
00000000 00000000 be8db940
[ 64.241361] ffe0: b6fa382c be8db8ec b6f32a3c b6e42bdc
[ 64.246386]
[ 64.247823] ======================================================
[ 64.253979] WARNING: possible circular locking dependency detected
[ 64.260133] 5.0.0-rc5-next-20190206-00001-g101ffa564f78 #348
Tainted: G W
[ 64.268193] ------------------------------------------------------
[ 64.274342] aplay/353 is trying to acquire lock:
[ 64.278937] 51044846 (prepare_lock){+.+.}, at: clk_prepare_lock+0x50/0xf8
[ 64.285694]
[ 64.285694] but task is already holding lock:
[ 64.291500] 8b42bfe8 (&(&pri_dai->spinlock)->rlock){....}, at:
i2s_trigger+0x130/0x6c8
[ 64.299387]
[ 64.299387] which lock already depends on the new lock.
[ 64.299387]
[ 64.307534]
[ 64.307534] the existing dependency chain (in reverse order) is:
[ 64.314985]
[ 64.314985] -> #1 (&(&pri_dai->spinlock)->rlock){....}:
[ 64.321667] clk_mux_set_parent+0x34/0xb8
[ 64.326162] clk_core_set_parent_nolock+0x21c/0x54c
[ 64.331535] clk_set_parent+0x38/0x6c
[ 64.335696] of_clk_set_defaults+0x11c/0x384
[ 64.340460] of_clk_add_provider+0x8c/0xc8
[ 64.345054] samsung_i2s_probe+0x484/0x64c
[ 64.349651] platform_drv_probe+0x6c/0xa4
[ 64.354153] really_probe+0x280/0x414
[ 64.358311] driver_probe_device+0x78/0x1c4
[ 64.362991] bus_for_each_drv+0x74/0xb8
[ 64.367323] __device_attach+0xd4/0x16c
[ 64.371655] bus_probe_device+0x88/0x90
[ 64.375988] deferred_probe_work_func+0x4c/0xd0
[ 64.381017] process_one_work+0x228/0x810
[ 64.385520] worker_thread+0x30/0x570
[ 64.389681] kthread+0x134/0x164
[ 64.393405] ret_from_fork+0x14/0x20
[ 64.397477] (null)
[ 64.400249]
[ 64.400249] -> #0 (prepare_lock){+.+.}:
[ 64.405539] __mutex_lock+0x7c/0xa98
[ 64.409610] mutex_lock_nested+0x1c/0x24
[ 64.414029] clk_prepare_lock+0x50/0xf8
[ 64.418362] clk_core_get_rate+0xc/0x60
[ 64.422695] i2s_trigger+0x444/0x6c8
[ 64.426768] soc_pcm_trigger+0x100/0x140
[ 64.431188] snd_pcm_action_single+0x38/0x80
[ 64.435953] snd_pcm_action+0x54/0x5c
[ 64.440112] snd_pcm_action_lock_irq+0x28/0x3c
[ 64.445052] snd_pcm_ioctl+0x920/0x123c
[ 64.449386] do_vfs_ioctl+0xb0/0x9f8
[ 64.453457] ksys_ioctl+0x34/0x5c
[ 64.457269] ret_fast_syscall+0x0/0x28
[ 64.461516] 0xbe8db8ec
[ 64.464460]
[ 64.464460] other info that might help us debug this:
[ 64.464460]
[ 64.472438] Possible unsafe locking scenario:
[ 64.472438]
[ 64.478327] CPU0 CPU1
[ 64.482831] ---- ----
[ 64.487336] lock(&(&pri_dai->spinlock)->rlock);
[ 64.492017] lock(prepare_lock);
[ 64.497823]
lock(&(&pri_dai->spinlock)->rlock);
[ 64.505017] lock(prepare_lock);
[ 64.508306]
[ 64.508306] *** DEADLOCK ***
[ 64.508306]
[ 64.514203] 2 locks held by aplay/353:
[ 64.517935] #0: 1fb9b63d (&(&group->lock)->rlock){....}, at:
snd_pcm_action_lock_irq+0x18/0x3c
[ 64.526596] #1: 8b42bfe8 (&(&pri_dai->spinlock)->rlock){....}, at:
i2s_trigger+0x130/0x6c8
[ 64.534915]
[ 64.534915] stack backtrace:
[ 64.539246] CPU: 6 PID: 353 Comm: aplay Tainted: G W
5.0.0-rc5-next-20190206-00001-g101ffa564f78 #348
[ 64.549734] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 64.555802] [<c0112318>] (unwind_backtrace) from [<c010df2c>]
(show_stack+0x10/0x14)
[ 64.563515] [<c010df2c>] (show_stack) from [<c0a5626c>]
(dump_stack+0x98/0xc4)
[ 64.570708] [<c0a5626c>] (dump_stack) from [<c01846b0>]
(print_circular_bug+0x210/0x334)
[ 64.578765] [<c01846b0>] (print_circular_bug) from [<c01874ec>]
(__lock_acquire+0x12cc/0x1a5c)
[ 64.587344] [<c01874ec>] (__lock_acquire) from [<c01886d8>]
(lock_acquire+0xe0/0x268)
[ 64.595142] [<c01886d8>] (lock_acquire) from [<c0a732f0>]
(__mutex_lock+0x7c/0xa98)
[ 64.602768] [<c0a732f0>] (__mutex_lock) from [<c0a73d28>]
(mutex_lock_nested+0x1c/0x24)
[ 64.610740] [<c0a73d28>] (mutex_lock_nested) from [<c04c2838>]
(clk_prepare_lock+0x50/0xf8)
[ 64.619059] [<c04c2838>] (clk_prepare_lock) from [<c04c5f48>]
(clk_core_get_rate+0xc/0x60)
[ 64.627291] [<c04c5f48>] (clk_core_get_rate) from [<c07b419c>]
(i2s_trigger+0x444/0x6c8)
[ 64.635349] [<c07b419c>] (i2s_trigger) from [<c079cba0>]
(soc_pcm_trigger+0x100/0x140)
[ 64.643235] [<c079cba0>] (soc_pcm_trigger) from [<c07839c0>]
(snd_pcm_action_single+0x38/0x80)
[ 64.651815] [<c07839c0>] (snd_pcm_action_single) from [<c0783a5c>]
(snd_pcm_action+0x54/0x5c)
[ 64.660306] [<c0783a5c>] (snd_pcm_action) from [<c0783bd4>]
(snd_pcm_action_lock_irq+0x28/0x3c)
[ 64.668972] [<c0783bd4>] (snd_pcm_action_lock_irq) from
[<c07860b0>] (snd_pcm_ioctl+0x920/0x123c)
[ 64.677811] [<c07860b0>] (snd_pcm_ioctl) from [<c02aa6d4>]
(do_vfs_ioctl+0xb0/0x9f8)
[ 64.685522] [<c02aa6d4>] (do_vfs_ioctl) from [<c02ab050>]
(ksys_ioctl+0x34/0x5c)
[ 64.692887] [<c02ab050>] (ksys_ioctl) from [<c0101000>]
(ret_fast_syscall+0x0/0x28)
[ 64.700510] Exception stack(0xe69dffa8 to 0xe69dfff0)
[ 64.705536] ffa0: 004391b8 00001770 00000004
00004142 00439340 00439340
[ 64.713684] ffc0: 004391b8 00001770 00001770 00000036 00001770
00000000 00000000 be8db940
[ 64.721828] ffe0: b6fa382c be8db8ec b6f32a3c b6e42bdc


Best regards,
Krzysztof

2019-02-06 11:02:05

by Charles Keepax

[permalink] [raw]
Subject: Re: [PATCH] ASoC: dapm: Check for NULL widget in dapm_update_dai_unlocked

On Wed, Feb 06, 2019 at 11:22:33AM +0100, Krzysztof Kozlowski wrote:
> On Wed, 6 Feb 2019 at 11:05, Charles Keepax
> <[email protected]> wrote:
> >
> > DAIs linked to the dummy will not have an associated playback/capture
> > widget, so we need to skip the update in that case.
> >
> > Fixes: 078a85f2806f ("ASoC: dapm: Only power up active channels from a DAI")
> > Signed-off-by: Charles Keepax <[email protected]>
> > ---
> >
> > Ok so that all makes sense, this patch is probably the best fix?
> >
> > Thanks,
> > Charles
>
> For this particular issue (NULL-pointer):
> Reported-by: Krzysztof Kozlowski <[email protected]>
> Tested-by: Krzysztof Kozlowski <[email protected]>
>
> However now I see bug sleeping in atomic context:
>
> [ 64.000828] BUG: sleeping function called from invalid context at
> ../kernel/locking/mutex.c:908

Does this probably definitely get fixed by reverting my patch?
It's just a bit odd as this seems to be complaining about a clock
operation in i2s_trigger and I don't think my patch should have
any affect on the trigger callback. It should get run from either
the dai_link DAPM event or hw_params, neither of which should
happen in an atomic context.

> [ 64.307534] the existing dependency chain (in reverse order) is:
> [ 64.314985]
> [ 64.314985] -> #1 (&(&pri_dai->spinlock)->rlock){....}:
> [ 64.321667] clk_mux_set_parent+0x34/0xb8
> [ 64.326162] clk_core_set_parent_nolock+0x21c/0x54c
> [ 64.331535] clk_set_parent+0x38/0x6c
> [ 64.335696] of_clk_set_defaults+0x11c/0x384
> [ 64.340460] of_clk_add_provider+0x8c/0xc8
> [ 64.345054] samsung_i2s_probe+0x484/0x64c
> [ 64.349651] platform_drv_probe+0x6c/0xa4
> [ 64.354153] really_probe+0x280/0x414
> [ 64.358311] driver_probe_device+0x78/0x1c4
> [ 64.362991] bus_for_each_drv+0x74/0xb8
> [ 64.367323] __device_attach+0xd4/0x16c
> [ 64.371655] bus_probe_device+0x88/0x90
> [ 64.375988] deferred_probe_work_func+0x4c/0xd0
> [ 64.381017] process_one_work+0x228/0x810
> [ 64.385520] worker_thread+0x30/0x570
> [ 64.389681] kthread+0x134/0x164
> [ 64.393405] ret_from_fork+0x14/0x20
> [ 64.397477] (null)
> [ 64.400249]
> [ 64.400249] -> #0 (prepare_lock){+.+.}:
> [ 64.405539] __mutex_lock+0x7c/0xa98
> [ 64.409610] mutex_lock_nested+0x1c/0x24
> [ 64.414029] clk_prepare_lock+0x50/0xf8
> [ 64.418362] clk_core_get_rate+0xc/0x60
> [ 64.422695] i2s_trigger+0x444/0x6c8
> [ 64.426768] soc_pcm_trigger+0x100/0x140
> [ 64.431188] snd_pcm_action_single+0x38/0x80
> [ 64.435953] snd_pcm_action+0x54/0x5c
> [ 64.440112] snd_pcm_action_lock_irq+0x28/0x3c
> [ 64.445052] snd_pcm_ioctl+0x920/0x123c
> [ 64.449386] do_vfs_ioctl+0xb0/0x9f8
> [ 64.453457] ksys_ioctl+0x34/0x5c
> [ 64.457269] ret_fast_syscall+0x0/0x28
> [ 64.461516] 0xbe8db8ec

Thanks,
Charles

2019-02-06 11:05:08

by Charles Keepax

[permalink] [raw]
Subject: Re: [BUG BISECT] NULL pointer after commit "ASoC: dapm: Only power up active channels from a DAI"

On Wed, Feb 06, 2019 at 11:11:03AM +0100, Sylwester Nawrocki wrote:
> On 2/6/19 10:46, Sylwester Nawrocki wrote:
> > On 2/5/19 22:16, Krzysztof Kozlowski wrote:
> >> Bisect pointed to commit:
> >> commit 078a85f2806f0ffd11289009462a6a390f9adb5c
> >> Author: Charles Keepax <[email protected]>
> >> Date: Thu Jan 31 13:30:18 2019 +0000
> >> ASoC: dapm: Only power up active channels from a DAI
> >>
> >> as a bad commit for NULL pointer on my Odroid XU3 and Odroid U3 board when
> >> doing "aplay /usr/share/sounds/alsa/Front_Right.wav".
> >>
> >> 1. Arch ARM Linux
> >> 2. exynos_defconfig
> >> 3. Odroid U3, XU3, Exynos SoC, ARMv7
> >>
> >> Last address is in calltrace:
> >> c079552c
> >> dapm_update_dai_unlocked
> >> sound/soc/soc-dapm.c:2586
> >
> > Thanks for bisecting this, I ran into same issue last night and I'm starting
> > to debug this now. I have added some debug prints and it looks like it oopses
> > on NULL playback_widget of the dummy DAI.
> >
> > [ 30.701182] hdmi-audio-codec hdmi-audio-codec.0.auto: Update DAI routes for i2s-hifi playback
> > [ 30.709630] dapm_update_dai_unlocked:2586 w=8bd27d28
> > [ 30.714403] dapm_update_dai_unlocked:2594 w=8bd27d28
> > [ 30.724688] max98090 5-0010: Update DAI routes for HiFi playback
> > [ 30.730163] dapm_update_dai_unlocked:2586 w=3fc942af
> > [ 30.735154] dapm_update_dai_unlocked:2594 w=3fc942af
> > [ 30.745051] snd-soc-dummy snd-soc-dummy: Update DAI routes for snd-soc-dummy-dai playback
> > [ 30.753128] dapm_update_dai_unlocked:2586 w= (null)
> > [ 30.758114] Unable to handle kernel NULL pointer dereference at virtual address 0000007c
>
> With a change as below there is no oops and everything works again,
> but I'm not sure this is a proper fix.
>

I think this is the proper fix, if we have DAIs which don't have
capture/playback widgets then there should be no parts of the
DAPM graph that need updated in relation to the number of
channels, so skipping the processing seems fine.

Thanks,
Charles

> diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
> index 5b74dffc9c11..111a23a9708a 100644
> --- a/sound/soc/soc-dapm.c
> +++ b/sound/soc/soc-dapm.c
> @@ -2580,6 +2580,9 @@ static int dapm_update_dai_unlocked(struct snd_pcm_substream *substream,
> else
> w = dai->capture_widget;
>
> + if (!w)
> + return 0;
> +
> dev_dbg(dai->dev, "Update DAI routes for %s %s\n", dai->name,
> dir == SNDRV_PCM_STREAM_PLAYBACK ? "playback" : "capture");
>
> --
> Thanks,
> Sylwester

2019-02-06 11:16:10

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH] ASoC: dapm: Check for NULL widget in dapm_update_dai_unlocked

On Wed, 6 Feb 2019 at 12:00, Charles Keepax
<[email protected]> wrote:
>
> On Wed, Feb 06, 2019 at 11:22:33AM +0100, Krzysztof Kozlowski wrote:
> > On Wed, 6 Feb 2019 at 11:05, Charles Keepax
> > <[email protected]> wrote:
> > >
> > > DAIs linked to the dummy will not have an associated playback/capture
> > > widget, so we need to skip the update in that case.
> > >
> > > Fixes: 078a85f2806f ("ASoC: dapm: Only power up active channels from a DAI")
> > > Signed-off-by: Charles Keepax <[email protected]>
> > > ---
> > >
> > > Ok so that all makes sense, this patch is probably the best fix?
> > >
> > > Thanks,
> > > Charles
> >
> > For this particular issue (NULL-pointer):
> > Reported-by: Krzysztof Kozlowski <[email protected]>
> > Tested-by: Krzysztof Kozlowski <[email protected]>
> >
> > However now I see bug sleeping in atomic context:
> >
> > [ 64.000828] BUG: sleeping function called from invalid context at
> > ../kernel/locking/mutex.c:908
>
> Does this probably definitely get fixed by reverting my patch?
> It's just a bit odd as this seems to be complaining about a clock
> operation in i2s_trigger and I don't think my patch should have
> any affect on the trigger callback. It should get run from either
> the dai_link DAPM event or hw_params, neither of which should
> happen in an atomic context.

Before this fixup, probably NULL pointer happened before any of this.
I tried it now few times and the possible deadlock and sleeping in
invalid context did not appear. It might be random/racy or totally
unrelated to your change.

Best regards,
Krzysztof

2019-02-06 11:58:45

by Charles Keepax

[permalink] [raw]
Subject: Re: [PATCH] ASoC: dapm: Check for NULL widget in dapm_update_dai_unlocked

On Wed, Feb 06, 2019 at 12:14:56PM +0100, Krzysztof Kozlowski wrote:
> On Wed, 6 Feb 2019 at 12:00, Charles Keepax
> <[email protected]> wrote:
> >
> > On Wed, Feb 06, 2019 at 11:22:33AM +0100, Krzysztof Kozlowski wrote:
> > > On Wed, 6 Feb 2019 at 11:05, Charles Keepax
> > > <[email protected]> wrote:
> > > >
> > > > DAIs linked to the dummy will not have an associated playback/capture
> > > > widget, so we need to skip the update in that case.
> > > >
> > > > Fixes: 078a85f2806f ("ASoC: dapm: Only power up active channels from a DAI")
> > > > Signed-off-by: Charles Keepax <[email protected]>
> > > > ---
> > > >
> > > > Ok so that all makes sense, this patch is probably the best fix?
> > > >
> > > > Thanks,
> > > > Charles
> > >
> > > For this particular issue (NULL-pointer):
> > > Reported-by: Krzysztof Kozlowski <[email protected]>
> > > Tested-by: Krzysztof Kozlowski <[email protected]>
> > >
> > > However now I see bug sleeping in atomic context:
> > >
> > > [ 64.000828] BUG: sleeping function called from invalid context at
> > > ../kernel/locking/mutex.c:908
> >
> > Does this probably definitely get fixed by reverting my patch?
> > It's just a bit odd as this seems to be complaining about a clock
> > operation in i2s_trigger and I don't think my patch should have
> > any affect on the trigger callback. It should get run from either
> > the dai_link DAPM event or hw_params, neither of which should
> > happen in an atomic context.
>
> Before this fixup, probably NULL pointer happened before any of this.
> I tried it now few times and the possible deadlock and sleeping in
> invalid context did not appear. It might be random/racy or totally
> unrelated to your change.
>

Looking through I think this is an unrelated issue. Assuming the
driver in question is (sound/soc/samsung/i2s.c). Inside
i2s_trigger, there is a call to config_setup(i2s), which in turn
will call clk_get_rate if i2s->quirks contains the flag
QUIRK_NO_MUXPSR.

The trigger callback can be made from an atomic context and
clk_get_rate will take the prepare lock of the clock. The clock
prepare lock is always a mutex which shouldn't be locked from an
atomic context. So it seems like this will fail whenever that
QUIRK_NO_MUXPSR flag is set, no idea what causes that to be set.

It looks like this bug was introduced by this change:

647d04f8e07a ("ASoC: samsung: i2s: Ensure the RCLK rate is properly determined")

Thanks,
Charles

2019-02-06 15:58:30

by Sylwester Nawrocki

[permalink] [raw]
Subject: Re: [PATCH] ASoC: dapm: Check for NULL widget in dapm_update_dai_unlocked

On 2/6/19 12:45, Charles Keepax wrote:
> Looking through I think this is an unrelated issue. Assuming the
> driver in question is (sound/soc/samsung/i2s.c). Inside
> i2s_trigger, there is a call to config_setup(i2s), which in turn
> will call clk_get_rate if i2s->quirks contains the flag
> QUIRK_NO_MUXPSR.
>
> The trigger callback can be made from an atomic context and
> clk_get_rate will take the prepare lock of the clock. The clock
> prepare lock is always a mutex which shouldn't be locked from an
> atomic context. So it seems like this will fail whenever that
> QUIRK_NO_MUXPSR flag is set, no idea what causes that to be set.
>
> It looks like this bug was introduced by this change:
>
> 647d04f8e07a ("ASoC: samsung: i2s: Ensure the RCLK rate is properly determined")

Thanks or having a look at this. I somehow overlooked it before, there
are multiple issues with that clk_get_rate() call. Apart of the problem
described above config_setup() is already called with the i2s->lock
spinlock held, exactly same spinlock that is passed to the clock API
when registering a clock of which we try to get rate. :/ Presumably
it works due to clk rate caching.

Whether QUIRK_NO_MUXPSR flag is set or not depends on the HW type,
it is not set for modern SoCs so most of the time we will hit the
problem in I2S master configurations.

As we are not using set_sysclk() of the CPU DAI I'm thinking about
moving the clk_get_rate() call to the CPU DAI's hw_params() callback.
The clock rate may be changed in hw_params() of an ASoC machine driver,
and the CPU DAI needs to adjust to those changes.

It feels like locking in that driver might need revisiting, there is
quite a lot happening while holding a spinlock.

--
Thanks,
Sylwester