2019-01-11 15:46:11

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component


On 1/11/19 2:14 AM, Rohit kumar wrote:
> From: Ajit Pandey <[email protected]>
>
> soc_find_component() may lead to null pointer exception if both
> arguments i.e of_node and name is NULL. Add NULL check before
> calling soc_find_component(). Also fix some typos.

Thanks for the overnight fix. This update fixes the issue on my Skylake
XPS13 test device (blind testing since I don't understand what the code
does).

Tested-by: Pierre-Louis Bossart <[email protected]>

>
> Fixes: 8780cf1142a5 ("ASoC: soc-core: defer card probe until all component is added to list")
> Reported-by: Pierre-Louis Bossart <[email protected]>
> Signed-off-by: Ajit Pandey <[email protected]>
> Signed-off-by: Rohit kumar <[email protected]>
> ---
> sound/soc/soc-core.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
> index 0934b36..df05fb8 100644
> --- a/sound/soc/soc-core.c
> +++ b/sound/soc/soc-core.c
> @@ -1131,11 +1131,13 @@ static int soc_init_dai_link(struct snd_soc_card *card,
> }
>
> /*
> - * Defer card registartion if platform dai component is not added to
> + * Defer card registration if platform dai component is not added to
> * component list.
> */
> - if (!soc_find_component(link->platform->of_node, link->platform->name))
> - return -EPROBE_DEFER;
> + if (link->platform->of_node || link->platform->name)
> + if (!soc_find_component(link->platform->of_node,
> + link->platform->name))
> + return -EPROBE_DEFER;
>
> /*
> * CPU device may be specified by either name or OF node, but
> @@ -1150,11 +1152,12 @@ static int soc_init_dai_link(struct snd_soc_card *card,
> }
>
> /*
> - * Defer card registartion if cpu dai component is not added to
> + * Defer card registration if cpu dai component is not added to
> * component list.
> */
> - if (!soc_find_component(link->cpu_of_node, link->cpu_name))
> - return -EPROBE_DEFER;
> + if (link->cpu_of_node || link->cpu_name)
> + if (!soc_find_component(link->cpu_of_node, link->cpu_name))
> + return -EPROBE_DEFER;
>
> /*
> * At least one of CPU DAI name or CPU device name/node must be


2019-01-11 21:51:20

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component


> Thanks for the overnight fix. This update fixes the issue on my
> Skylake XPS13 test device (blind testing since I don't understand what
> the code does).
>
> Tested-by: Pierre-Louis Bossart <[email protected]>

I need to take this back, this set of changes (initial+fix) causes an
error with our HDMI support

[   17.437684] sof-audio sof-audio: created machine bxt-pcm512x
[   17.585279] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1
[   17.585639] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517

Removing your changes restores the functionality

Adding some traces I can see that the the platform name we use doesn't
seem compatible with your logic. All the Intel boards used a constant
platform name matching the PCI ID, see e.g. [1], which IIRC is used to
bind components. Liam, do you recall in more details if this is really
required?

[1]
https://elixir.bootlin.com/linux/latest/source/sound/soc/intel/boards/bxt_da7219_max98357a.c#L475

[   18.205812] plb: platform name sof-audio
[   18.206059] plb: cpu_name (null)
[   18.206234] plb: platform name 0000:00:0e.0
[   18.206459] plb: returning -EPROBE_DEFER 1
[   18.206686] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1
[   18.207054] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517

diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index cbafbdd02483..ae731212f82b 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -1133,11 +1133,15 @@ static int soc_init_dai_link(struct snd_soc_card
*card,
         * Defer card registration if platform dai component is not
added to
         * component list.
         */
+       pr_err("plb: platform name %s\n", link->platform->name);
        if (link->platform->of_node || link->platform->name)
                if (!soc_find_component(link->platform->of_node,
link->platform->name))
-                       return -EPROBE_DEFER;
+                 {
+                         pr_err("plb: returning -EPROBE_DEFER 1\n");
+                         return -EPROBE_DEFER;

+                 }
        /*
         * CPU device may be specified by either name or OF node, but
         * can be left unspecified, and will be matched based on DAI
@@ -1154,9 +1158,14 @@ static int soc_init_dai_link(struct snd_soc_card
*card,
         * Defer card registration if cpu dai component is not added to
         * component list.
         */
+       pr_err("plb: cpu_name %s\n", link->cpu_name);
        if (link->cpu_of_node || link->cpu_name)
                if (!soc_find_component(link->cpu_of_node, link->cpu_name))
-                       return -EPROBE_DEFER;
+                 {
+                         pr_err("plb: returning -EPROBE_DEFER 2\n");
+                         return -EPROBE_DEFER;
+
+                 }

        /*
         * At least one of CPU DAI name or CPU device name/node must be


2019-01-12 06:09:58

by Rohit Kumar

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On 1/12/2019 3:19 AM, Pierre-Louis Bossart wrote:
>
>> Thanks for the overnight fix. This update fixes the issue on my
>> Skylake XPS13 test device (blind testing since I don't understand
>> what the code does).
>>
>> Tested-by: Pierre-Louis Bossart <[email protected]>
>
> I need to take this back, this set of changes (initial+fix) causes an
> error with our HDMI support
>
> [ 17.437684] sof-audio sof-audio: created machine bxt-pcm512x
> [ 17.585279] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1
> [ 17.585639] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517
>
> Removing your changes restores the functionality
>
Looks like we should revert generic implementation for defering probe
and move to call from machine driver as done in v1.
https://lore.kernel.org/patchwork/patch/1027560/
https://lore.kernel.org/patchwork/patch/1027561/

@Mark, Do you have suggestion to refine current patch?
> Adding some traces I can see that the the platform name we use doesn't
> seem compatible with your logic. All the Intel boards used a constant
> platform name matching the PCI ID, see e.g. [1], which IIRC is used to
> bind components. Liam, do you recall in more details if this is really
> required?
>
> [1]
> https://elixir.bootlin.com/linux/latest/source/sound/soc/intel/boards/bxt_da7219_max98357a.c#L475
I think if it does not set platform name as 0000:00:0e.0, it should take
snd-soc-dummy as platform name
and that might fix the issue.
snd-soc-dummy does have component associated with it.
https://elixir.bootlin.com/linux/latest/source/sound/soc/soc-utils.c#L281
>
> [ 18.205812] plb: platform name sof-audio
> [ 18.206059] plb: cpu_name (null)
> [ 18.206234] plb: platform name 0000:00:0e.0
> [ 18.206459] plb: returning -EPROBE_DEFER 1
> [ 18.206686] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1
> [ 18.207054] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517
>
> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
> index cbafbdd02483..ae731212f82b 100644
> --- a/sound/soc/soc-core.c
> +++ b/sound/soc/soc-core.c
> @@ -1133,11 +1133,15 @@ static int soc_init_dai_link(struct
> snd_soc_card *card,
> * Defer card registration if platform dai component is not
> added to
> * component list.
> */
> + pr_err("plb: platform name %s\n", link->platform->name);
> if (link->platform->of_node || link->platform->name)
> if (!soc_find_component(link->platform->of_node,
> link->platform->name))
> - return -EPROBE_DEFER;
> + {
> + pr_err("plb: returning -EPROBE_DEFER 1\n");
> + return -EPROBE_DEFER;
>
> + }
> /*
> * CPU device may be specified by either name or OF node, but
> * can be left unspecified, and will be matched based on DAI
> @@ -1154,9 +1158,14 @@ static int soc_init_dai_link(struct
> snd_soc_card *card,
> * Defer card registration if cpu dai component is not added to
> * component list.
> */
> + pr_err("plb: cpu_name %s\n", link->cpu_name);
> if (link->cpu_of_node || link->cpu_name)
> if (!soc_find_component(link->cpu_of_node,
> link->cpu_name))
> - return -EPROBE_DEFER;
> + {
> + pr_err("plb: returning -EPROBE_DEFER 2\n");
> + return -EPROBE_DEFER;
> +
> + }
>
> /*
> * At least one of CPU DAI name or CPU device name/node must be
>

Thanks,
Rohit
--
Qualcomm INDIA, on behalf of Qualcomm Innovation Center, Inc.is a member
of the Code Aurora Forum, hosted by the Linux Foundation.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


2019-01-14 15:42:35

by Liam Girdwood

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On Sat, 2019-01-12 at 11:37 +0530, Rohit kumar wrote:
> On 1/12/2019 3:19 AM, Pierre-Louis Bossart wrote:
> > > Thanks for the overnight fix. This update fixes the issue on my
> > > Skylake XPS13 test device (blind testing since I don't understand
> > > what the code does).
> > >
> > > Tested-by: Pierre-Louis Bossart <[email protected]>
> >
> > I need to take this back, this set of changes (initial+fix) causes an
> > error with our HDMI support
> >
> > [ 17.437684] sof-audio sof-audio: created machine bxt-pcm512x
> > [ 17.585279] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1
> > [ 17.585639] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517
> >
> > Removing your changes restores the functionality
> >
> Looks like we should revert generic implementation for defering probe
> and move to call from machine driver as done in v1.
> https://lore.kernel.org/patchwork/patch/1027560/
> https://lore.kernel.org/patchwork/patch/1027561/
>
> @Mark, Do you have suggestion to refine current patch?
> > Adding some traces I can see that the the platform name we use doesn't
> > seem compatible with your logic. All the Intel boards used a constant
> > platform name matching the PCI ID, see e.g. [1], which IIRC is used to
> > bind components. Liam, do you recall in more details if this is really
> > required?

Sorry, I cant quite remember why the PCI ID was used for the platform name, I
think it started with the SKL generation as previous generations used "baytrail-
pcm" and "haswell-pcm" as platform names IIRC. Perhaps Vinod will know.

The platform name is only used by SOF when over writing the "legacy" platform
name e.g. "baytrail-pcm" would become "sof-audio" and this is only used for
binding DAI links (so that all legacy machine drivers can be reused without
modification).

Liam


2019-01-15 00:07:43

by Mark Brown

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote:

> Adding some traces I can see that the the platform name we use doesn't seem
> compatible with your logic. All the Intel boards used a constant platform
> name matching the PCI ID, see e.g. [1], which IIRC is used to bind
> components. Liam, do you recall in more details if this is really required?

That's telling me that either snd_soc_find_components() isn't finding
components in the same way that we do when we bind things which isn't
good or we're binding links without having fully matched everything on
the link which also isn't good.

Without a system that shows the issue I can't 100% confirm but I think
it's the former - I'm fairly sure that those machines are relying on the
component name being initialized to fmt_single_name() in
snd_soc_component_initialize(). That is supposed to wind up using
dev_name() (which would be the PCI address for a PCI device) as the
basis of the name. What I can't currently see is how exactly that gets
bound (or how any of the other links avoid trouble for that matter). We
could revert and push this into cards but I would rather be confident
that we understand what's going on, I'm not comfortable that we aren't
just pushing the breakage around rather than fixing it. Can someone
with an x86 system take a look and confirm exactly what's going on with
binding these cards please?


Attachments:
(No filename) (1.40 kB)
signature.asc (499.00 B)
Download all attachments

2019-01-15 03:30:04

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component


On 1/14/19 6:06 PM, Mark Brown wrote:
> On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote:
>
>> Adding some traces I can see that the the platform name we use doesn't seem
>> compatible with your logic. All the Intel boards used a constant platform
>> name matching the PCI ID, see e.g. [1], which IIRC is used to bind
>> components. Liam, do you recall in more details if this is really required?
> That's telling me that either snd_soc_find_components() isn't finding
> components in the same way that we do when we bind things which isn't
> good or we're binding links without having fully matched everything on
> the link which also isn't good.
>
> Without a system that shows the issue I can't 100% confirm but I think
> it's the former - I'm fairly sure that those machines are relying on the
> component name being initialized to fmt_single_name() in
> snd_soc_component_initialize(). That is supposed to wind up using
> dev_name() (which would be the PCI address for a PCI device) as the
> basis of the name. What I can't currently see is how exactly that gets
> bound (or how any of the other links avoid trouble for that matter). We
> could revert and push this into cards but I would rather be confident
> that we understand what's going on, I'm not comfortable that we aren't
> just pushing the breakage around rather than fixing it. Can someone
> with an x86 system take a look and confirm exactly what's going on with
> binding these cards please?

I am actually not sure at all why we need the platform_name to be set in
Intel machine drivers.

I ran a 5mn experiment with SOF and we completely ignore what is set by
machine drivers, it's overridden by the component name:

??? ??? ??? dev_info(card->dev, "info: override FE DAI link %s\n",
??? ??? ??? ??? ?card->dai_link[i].name);

??? ??? ??? /* override platform component */
??? ??? ??? if (snd_soc_init_platform(card, dai_link) < 0) {
??? ??? ??? ??? dev_err(card->dev, "init platform error");
??? ??? ??? ??? continue;
??? ??? ??? }
??? ??? ??? pr_err("plb: platform_name was %s, now %s\n",
??? ??? ??? ?????? dai_link->platform->name,
??? ??? ??? ?????? component->name);

??? ??? ??? dai_link->platform->name = component->name;

existing machine driver (info is incorrect btw, should be BE DAI link)

[?? 36.628466] bxt-pcm512x bxt-pcm512x: info: override FE DAI link
SSP5-Codec
[?? 36.628469] plb: platform_name was sof-audio, now sof-audio
[?? 36.628772] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp1
[?? 36.628773] plb: platform_name was 0000:00:0e.0, now sof-audio
[?? 36.629111] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp2
[?? 36.629112] plb: platform_name was 0000:00:0e.0, now sof-audio
[?? 36.629422] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp3
[?? 36.629423] plb: platform_name was 0000:00:0e.0, now sof-audio

machine driver with all platform_name commented out, no regression at all.

[?? 15.839652] sof-audio sof-audio: created machine bxt-pcm512x
[?? 15.846990] bxt-pcm512x bxt-pcm512x: info: override FE DAI link
SSP5-Codec
[?? 15.846995] plb: platform_name was snd-soc-dummy, now sof-audio
[?? 15.847325] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp1
[?? 15.847326] plb: platform_name was snd-soc-dummy, now sof-audio
[?? 15.847657] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp2
[?? 15.847658] plb: platform_name was snd-soc-dummy, now sof-audio
[?? 15.847974] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp3
[?? 15.847974] plb: platform_name was snd-soc-dummy, now sof-audio

I checked for a bit longer with the Skylake driver, I also couldn't see
a difference with or without the platform_name set.

diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 0462b3ec977a..0fdf99ec17cd 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -918,7 +918,7 @@ static int soc_bind_dai_link(struct snd_soc_card *card,
??????????????? if (!snd_soc_is_matching_component(dai_link->platform,
?????????????????????????????????????????????????? component))
??????????????????????? continue;
-
+?????????????? pr_err("plb: binding with component_name %s \n",
component->name);
??????????????? snd_soc_rtdcom_add(rtd, component);
??????? }

@@ -1041,6 +1041,8 @@ static int snd_soc_init_platform(struct
snd_soc_card *card,
??????????????? if (!platform)
??????????????????????? return -ENOMEM;

+?????????????? pr_err("plb: %s: plaform->name set to %s\n", __func__,
+????????????????????? dai_link->platform_name);
??????????????? dai_link->platform????? = platform;
??????????????? platform->name????????? = dai_link->platform_name;
??????????????? platform->of_node?????? = dai_link->platform_of_node;

[??? 1.345143] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3
[??? 1.345148] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL Audio
[??? 1.345151] plb: binding with component_name 0000:00:1f.3
[??? 1.345153] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3
[??? 1.345154] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL HDMI1
[??? 1.345155] plb: binding with component_name 0000:00:1f.3
[??? 1.345157] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3
[??? 1.345158] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL HDMI2
[??? 1.345159] plb: binding with component_name 0000:00:1f.3
[??? 1.345161] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3
[??? 1.345162] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL HDMI3
[??? 1.345163] plb: binding with component_name 0000:00:1f.3

[??? 1.349607] plb: snd_soc_init_platform: plaform->name set to (null)
[??? 1.349613] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL Audio
[??? 1.349617] plb: binding with component_name snd-soc-dummy
[??? 1.349619] plb: binding with component_name snd-soc-dummy
[??? 1.349621] plb: snd_soc_init_platform: plaform->name set to (null)
[??? 1.349623] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL HDMI1
[??? 1.349625] plb: binding with component_name snd-soc-dummy
[??? 1.349626] plb: binding with component_name snd-soc-dummy
[??? 1.349628] plb: snd_soc_init_platform: plaform->name set to (null)
[??? 1.349631] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL HDMI2
[??? 1.349632] plb: binding with component_name snd-soc-dummy
[??? 1.349633] plb: binding with component_name snd-soc-dummy
[??? 1.349635] plb: snd_soc_init_platform: plaform->name set to (null)
[??? 1.349638] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding
CNL HDMI3
[??? 1.349639] plb: binding with component_name snd-soc-dummy
[??? 1.349641] plb: binding with component_name snd-soc-dummy

Audio playback works in both cases.

The funny thing is that the list of components is right in
/sys/kernel/debug/asoc.

My guess is that the required PCI component is already added when the
platform_name is used in soc_bind_dai_link() and snd_soc_rtdcom_add()
does nothing for the back-end, so the dailink platform_name has actually
zero influence on the outcome.

Another way to look at it is that we already provide the
dai_link->cpu_dai_name which is enough for soc_bind_dai_link() to find
the relevant component and as a result the dailink->platform_name seems
redundant/unnecessary. I am using the conditional here since this part
of the code (Intel driver included) seems to work by accident rather
than by design, and we'd need a set of additional tests before removing
these hard-coded initializations.

Does this help?

2019-01-16 10:02:11

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component


On 1/14/19 6:06 PM, Mark Brown wrote:
> On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote:
>
>> Adding some traces I can see that the the platform name we use doesn't seem
>> compatible with your logic. All the Intel boards used a constant platform
>> name matching the PCI ID, see e.g. [1], which IIRC is used to bind
>> components. Liam, do you recall in more details if this is really required?
> That's telling me that either snd_soc_find_components() isn't finding
> components in the same way that we do when we bind things which isn't
> good or we're binding links without having fully matched everything on
> the link which also isn't good.
>
> Without a system that shows the issue I can't 100% confirm but I think
> it's the former - I'm fairly sure that those machines are relying on the
> component name being initialized to fmt_single_name() in
> snd_soc_component_initialize(). That is supposed to wind up using
> dev_name() (which would be the PCI address for a PCI device) as the
> basis of the name. What I can't currently see is how exactly that gets
> bound (or how any of the other links avoid trouble for that matter). We
> could revert and push this into cards but I would rather be confident
> that we understand what's going on, I'm not comfortable that we aren't
> just pushing the breakage around rather than fixing it. Can someone
> with an x86 system take a look and confirm exactly what's going on with
> binding these cards please?

Beyond the fact that the platform_name seems to be totally useless,
additional tests show that the patch ('ASoC: soc-core: defer card probe
until all component is added to list') adds a new restriction which
contradicts existing error checks.

None of the Intel machine drivers set the dailink "cpu_name" field but
use the "cpu_dai_name" field instead. This was perfectly legit as
documented by the code at the end of soc_init_dai_link()

??? /*
??? ?* At least one of CPU DAI name or CPU device name/node must be
??? ?* specified
??? ?*/
??? if (!link->cpu_dai_name &&
??? ??? !(link->cpu_name || link->cpu_of_node)) {
??? ??? dev_err(card->dev,
??? ??? ??? "ASoC: Neither cpu_dai_name nor cpu_name/of_node are set
for %s\n",
??? ??? ??? link->name);
??? ??? return -EINVAL;
??? }

The code contributed by Qualcomm only checks for cpu_name, which
prevents the init from completing.

So if we want to be consistent, the new code should be something like:

diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index b680c673c553..2791da9417f8 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -1154,7 +1154,7 @@ static int soc_init_dai_link(struct snd_soc_card
*card,
???????? * Defer card registartion if cpu dai component is not added to
???????? * component list.
???????? */
-?????? if (!soc_find_component(link->cpu_of_node, link->cpu_name))
+?????? if (!link->cpu_dai_name &&
!soc_find_component(link->cpu_of_node, link->cpu_name))
??????????????? return -EPROBE_DEFER;

??????? /*

or try to call soc_find_component with both cpu_name or cpu_dai_name, if
this makes sense?

2019-01-16 12:51:54

by Mark Brown

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On Tue, Jan 15, 2019 at 01:35:07PM -0600, Pierre-Louis Bossart wrote:
> On 1/14/19 6:06 PM, Mark Brown wrote:

> > just pushing the breakage around rather than fixing it. Can someone
> > with an x86 system take a look and confirm exactly what's going on with
> > binding these cards please?

> Beyond the fact that the platform_name seems to be totally useless,
> additional tests show that the patch ('ASoC: soc-core: defer card probe
> until all component is added to list') adds a new restriction which
> contradicts existing error checks.

Yes... I'd been coming to the conclusion that it was a huge red herring
here.

> So if we want to be consistent, the new code should be something like:

> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
> index b680c673c553..2791da9417f8 100644
> --- a/sound/soc/soc-core.c
> +++ b/sound/soc/soc-core.c
> @@ -1154,7 +1154,7 @@ static int soc_init_dai_link(struct snd_soc_card
> *card,
> ???????? * Defer card registartion if cpu dai component is not added to
> ???????? * component list.
> ???????? */
> -?????? if (!soc_find_component(link->cpu_of_node, link->cpu_name))
> +?????? if (!link->cpu_dai_name && !soc_find_component(link->cpu_of_node,
> link->cpu_name))
> ??????????????? return -EPROBE_DEFER;
>
> ??????? /*

> or try to call soc_find_component with both cpu_name or cpu_dai_name, if
> this makes sense?

I think calling _find_component() makes more sense here as it will do
the check it's actually there thing no matter how the link is
identified. Assuming that does resolve the issue do you want to make a
patch given that you got there first?


Attachments:
(No filename) (1.62 kB)
signature.asc (499.00 B)
Download all attachments

2019-01-16 12:52:21

by Matthias Reichl

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On Tue, Jan 15, 2019 at 01:35:07PM -0600, Pierre-Louis Bossart wrote:
>
> On 1/14/19 6:06 PM, Mark Brown wrote:
> > On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote:
> >
> > > Adding some traces I can see that the the platform name we use doesn't seem
> > > compatible with your logic. All the Intel boards used a constant platform
> > > name matching the PCI ID, see e.g. [1], which IIRC is used to bind
> > > components. Liam, do you recall in more details if this is really required?
> > That's telling me that either snd_soc_find_components() isn't finding
> > components in the same way that we do when we bind things which isn't
> > good or we're binding links without having fully matched everything on
> > the link which also isn't good.
> >
> > Without a system that shows the issue I can't 100% confirm but I think
> > it's the former - I'm fairly sure that those machines are relying on the
> > component name being initialized to fmt_single_name() in
> > snd_soc_component_initialize(). That is supposed to wind up using
> > dev_name() (which would be the PCI address for a PCI device) as the
> > basis of the name. What I can't currently see is how exactly that gets
> > bound (or how any of the other links avoid trouble for that matter). We
> > could revert and push this into cards but I would rather be confident
> > that we understand what's going on, I'm not comfortable that we aren't
> > just pushing the breakage around rather than fixing it. Can someone
> > with an x86 system take a look and confirm exactly what's going on with
> > binding these cards please?
>
> Beyond the fact that the platform_name seems to be totally useless,
> additional tests show that the patch ('ASoC: soc-core: defer card probe
> until all component is added to list') adds a new restriction which
> contradicts existing error checks.
>
> None of the Intel machine drivers set the dailink "cpu_name" field but use
> the "cpu_dai_name" field instead. This was perfectly legit as documented by
> the code at the end of soc_init_dai_link()

This should be fixed by the patch
"ASoC: core: Don't defer probe on optional, NULL components" which Mark
already applied to his tree. See
http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html

Maybe the defer card probe logic needs to be extended to also check if
dai_link_name had already been registered (either cpu or cpu_dai_name
needs to be set), not 100% sure which problem the defer card probe patch
was trying to solve.

so long,

Hias

>
> ??? /*
> ??? ?* At least one of CPU DAI name or CPU device name/node must be
> ??? ?* specified
> ??? ?*/
> ??? if (!link->cpu_dai_name &&
> ??? ??? !(link->cpu_name || link->cpu_of_node)) {
> ??? ??? dev_err(card->dev,
> ??? ??? ??? "ASoC: Neither cpu_dai_name nor cpu_name/of_node are set for
> %s\n",
> ??? ??? ??? link->name);
> ??? ??? return -EINVAL;
> ??? }
>
> The code contributed by Qualcomm only checks for cpu_name, which prevents
> the init from completing.
>
> So if we want to be consistent, the new code should be something like:
>
> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
> index b680c673c553..2791da9417f8 100644
> --- a/sound/soc/soc-core.c
> +++ b/sound/soc/soc-core.c
> @@ -1154,7 +1154,7 @@ static int soc_init_dai_link(struct snd_soc_card
> *card,
> ???????? * Defer card registartion if cpu dai component is not added to
> ???????? * component list.
> ???????? */
> -?????? if (!soc_find_component(link->cpu_of_node, link->cpu_name))
> +?????? if (!link->cpu_dai_name && !soc_find_component(link->cpu_of_node,
> link->cpu_name))
> ??????????????? return -EPROBE_DEFER;
>
> ??????? /*
>
> or try to call soc_find_component with both cpu_name or cpu_dai_name, if
> this makes sense?
> _______________________________________________
> Alsa-devel mailing list
> [email protected]
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

2019-01-16 13:12:51

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component


>> Beyond the fact that the platform_name seems to be totally useless,
>> additional tests show that the patch ('ASoC: soc-core: defer card probe
>> until all component is added to list') adds a new restriction which
>> contradicts existing error checks.
>>
>> None of the Intel machine drivers set the dailink "cpu_name" field but use
>> the "cpu_dai_name" field instead. This was perfectly legit as documented by
>> the code at the end of soc_init_dai_link()
> This should be fixed by the patch
> "ASoC: core: Don't defer probe on optional, NULL components" which Mark
> already applied to his tree. See
> http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html

Ah yes, I missed this patch while I was debugging. Indeed this fixes the
problem and my devices work again with Mark's for-next branch. Thanks
Matthias!

>
> Maybe the defer card probe logic needs to be extended to also check if
> dai_link_name had already been registered (either cpu or cpu_dai_name
> needs to be set), not 100% sure which problem the defer card probe patch
> was trying to solve.
same here, I don't get why the deferred probe stuff only deals with one
of the two options.


2019-01-16 13:38:44

by Mark Brown

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On Tue, Jan 15, 2019 at 03:16:57PM -0600, Pierre-Louis Bossart wrote:

> > Maybe the defer card probe logic needs to be extended to also check if
> > dai_link_name had already been registered (either cpu or cpu_dai_name
> > needs to be set), not 100% sure which problem the defer card probe patch
> > was trying to solve.

We were getting cards probing without the platforms being registered
(which in turn meant we were just skipping their init) and had patches
proposed to implement the deferral in the cards. The deferral stuff is
supposed to making sure that everything is registered when we
instantiate.

> same here, I don't get why the deferred probe stuff only deals with one of
> the two options.

I think it's just an oversight - I think the change you were proposing
to check the cpu_dai_name is a good idea anyway as it makes things more
consistent and work more obviously by intention. And more generally if
we can simplify the code by removing legacy options that'd be good but
that's a bigger bit of work...


Attachments:
(No filename) (1.02 kB)
signature.asc (499.00 B)
Download all attachments

2019-01-16 14:21:33

by Matthias Reichl

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On Tue, Jan 15, 2019 at 09:41:38PM +0000, Mark Brown wrote:
> On Tue, Jan 15, 2019 at 03:16:57PM -0600, Pierre-Louis Bossart wrote:
>
> > > Maybe the defer card probe logic needs to be extended to also check if
> > > dai_link_name had already been registered (either cpu or cpu_dai_name
> > > needs to be set), not 100% sure which problem the defer card probe patch
> > > was trying to solve.
>
> We were getting cards probing without the platforms being registered
> (which in turn meant we were just skipping their init) and had patches
> proposed to implement the deferral in the cards. The deferral stuff is
> supposed to making sure that everything is registered when we
> instantiate.

Ah, that makes sense. Thanks a lot for the info!

so long,

Hias

>
> > same here, I don't get why the deferred probe stuff only deals with one of
> > the two options.
>
> I think it's just an oversight - I think the change you were proposing
> to check the cpu_dai_name is a good idea anyway as it makes things more
> consistent and work more obviously by intention. And more generally if
> we can simplify the code by removing legacy options that'd be good but
> that's a bigger bit of work...



2019-01-18 23:12:12

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component


On 1/15/19 3:16 PM, Pierre-Louis Bossart wrote:
>
>>> Beyond the fact that the platform_name seems to be totally useless,
>>> additional tests show that the patch ('ASoC: soc-core: defer card probe
>>> until all component is added to list') adds a new restriction which
>>> contradicts existing error checks.
>>>
>>> None of the Intel machine drivers set the dailink "cpu_name" field
>>> but use
>>> the "cpu_dai_name" field instead. This was perfectly legit as
>>> documented by
>>> the code at the end of soc_init_dai_link()
>> This should be fixed by the patch
>> "ASoC: core: Don't defer probe on optional, NULL components" which Mark
>> already applied to his tree. See
>> http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html
>>
>
> Ah yes, I missed this patch while I was debugging. Indeed this fixes
> the problem and my devices work again with Mark's for-next branch.
> Thanks Matthias!

This PROBE_DEFER support actually breaks the topology override that
we've been relying on for SOF (and which has been in Mark's branch for
some time now). This override helps us reuse machine drivers between
legacy and SOF-based solutions.

With the current code, the tests in soc_register_card() complain that
the platform_name can't be tied to a component and stop the card
registration, but that's mainly because the tests are done before the
topology overrides are done in soc_check_tplg_fes(). Moving
soc_check_tplg_fes() from soc_instantiate_card() to an earlier time in
soc_register_card() works-around the problem but looks quite invasive
(mutex lock, etc).

There is also a second problem where we seem to have a memory management
issue root caused to the change in snd_soc_init_platform() added by
09ac6a817bd6 ('ASoC: soc-core: fix init platform memory handling')

The code does this

static int snd_soc_init_platform(struct snd_soc_card *card,
                 struct snd_soc_dai_link *dai_link)
{
    struct snd_soc_dai_link_component *platform = dai_link->platform;


    /* convert Legacy platform link */
    if (!platform || dai_link->legacy_platform) {
        platform = devm_kzalloc(card->dev,
                sizeof(struct snd_soc_dai_link_component),
                GFP_KERNEL);
        if (!platform)
            return -ENOMEM;

        dai_link->platform      = platform;
        dai_link->legacy_platform = 1;

This last assignment guarantees that memory will be allocated every time
this function is called, and whatever overrides are done later will
themselves be overridden by the new allocation. I am not sure what the
intent was here, Curtis can you please double-check?

Details, test code and logs are available here:
https://github.com/thesofproject/linux/issues/565

Have a nice week-end everyone, that's it for me until Tuesday.

-Pierre




2019-01-21 19:19:31

by Mark Brown

[permalink] [raw]
Subject: Re: [alsa-devel] [PATCH] ASoC: soc-core: Fix null pointer dereference in soc_find_component

On Fri, Jan 18, 2019 at 05:02:08PM -0600, Pierre-Louis Bossart wrote:

> This PROBE_DEFER support actually breaks the topology override that we've
> been relying on for SOF (and which has been in Mark's branch for some time
> now). This override helps us reuse machine drivers between legacy and
> SOF-based solutions.

> With the current code, the tests in soc_register_card() complain that the
> platform_name can't be tied to a component and stop the card registration,
> but that's mainly because the tests are done before the topology overrides
> are done in soc_check_tplg_fes(). Moving soc_check_tplg_fes() from
> soc_instantiate_card() to an earlier time in soc_register_card()
> works-around the problem but looks quite invasive (mutex lock, etc).

Right, I see. I don't think there's going to be any non-invasive
solution here, either we need to stop overriding bits of the machine
driver relevant to binding in the topology files like this or we need to
look at the topology files in the probe deferral stuff so it's either
going to be very invasive for the machine driver or the core. I'm
tempted to say that the machine driver is a better option here.

> This last assignment guarantees that memory will be allocated every time
> this function is called, and whatever overrides are done later will
> themselves be overridden by the new allocation. I am not sure what the
> intent was here, Curtis can you please double-check?

The intent is to avoid modifying statically allocated data so we can
tell the difference between pointers set up statically as init data and
those that are filled in at runtime.


Attachments:
(No filename) (1.61 kB)
signature.asc (499.00 B)
Download all attachments