2022-09-27 08:35:04

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime



On 9/26/22 18:35, Eugeniu Rosca wrote:
> From: xiao jin <[email protected]>
>
> After start of fe and be, fe might go to close without triggering
> STOP, and substream->runtime is freed. However, be is still at
> START state and its substream->runtime still points to the
> freed runtime.
>
> Later on, FE is opened/started again, and triggers STOP.
> snd_pcm_do_stop => dpcm_fe_dai_trigger
> => dpcm_fe_dai_do_trigger
> => dpcm_be_dai_trigger
> => dpcm_do_trigger
> => soc_pcm_trigger
> => skl_platform_pcm_trigger
> skl_platform_pcm_trigger accesses the freed old runtime data and
> kernel panic.
>
> The patch fixes it by assigning be_substream->runtime in
> dpcm_be_dai_startup when be's state is START.

Can I ask on which kernel this patch was validated and on what platform?

We've done a lot of work since last year on DPCM states, and I wonder
the problem mentioned above actually exists on recent kernels.

Specifically, if the FE is closed, I don't get how the BE is not closed
as well. And if this problem is found on a recent kernel, then it should
be seen in the AVS driver as well, no?


> Signed-off-by: xiao jin <[email protected]>
> Signed-off-by: Zhang Yanmin <[email protected]>
> Signed-off-by: Eugeniu Rosca <[email protected]>
> ---
> sound/soc/soc-pcm.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
> index 4f60c0a83311..6ca1d02065ce 100644
> --- a/sound/soc/soc-pcm.c
> +++ b/sound/soc/soc-pcm.c
> @@ -1608,6 +1608,8 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
> if (be->dpcm[stream].users++ != 0)
> continue;
>
> + be_substream->runtime = be->dpcm[stream].runtime;
> +
> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_NEW) &&
> (be->dpcm[stream].state != SND_SOC_DPCM_STATE_CLOSE))
> continue;
> @@ -1615,7 +1617,6 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
> dev_dbg(be->dev, "ASoC: open %s BE %s\n",
> stream ? "capture" : "playback", be->dai_link->name);
>
> - be_substream->runtime = be->dpcm[stream].runtime;
> err = __soc_pcm_open(be, be_substream);
> if (err < 0) {
> be->dpcm[stream].users--;


2022-09-27 13:51:17

by Eugeniu Rosca

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime

Hi Pierre,

On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
> On 9/26/22 18:35, Eugeniu Rosca wrote:
> > From: xiao jin <[email protected]>
> >
> > After start of fe and be, fe might go to close without triggering
> > STOP, and substream->runtime is freed. However, be is still at
> > START state and its substream->runtime still points to the
> > freed runtime.
> >
> > Later on, FE is opened/started again, and triggers STOP.
> > snd_pcm_do_stop => dpcm_fe_dai_trigger
> > => dpcm_fe_dai_do_trigger
> > => dpcm_be_dai_trigger
> > => dpcm_do_trigger
> > => soc_pcm_trigger
> > => skl_platform_pcm_trigger
> > skl_platform_pcm_trigger accesses the freed old runtime data and
> > kernel panic.
> >
> > The patch fixes it by assigning be_substream->runtime in
> > dpcm_be_dai_startup when be's state is START.
>
> Can I ask on which kernel this patch was validated and on what platform?

As shared with Czarek in https://lore.kernel.org/alsa-devel/20220927110022.GA3802@lxhi-065/ ,
this patch was originally extracted from the out-of-tree Intel Apollo
Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.

No re-testing/re-validation has been performed on the latest vanilla.

One of the goals behind submitting the patch is getting in touch
with the original authors, as well as the members of alsa-devel,
to assess if the patch is still relevant.

>
> We've done a lot of work since last year on DPCM states,

Could you please feedback if the work on the DPCM states is
pre- or post-v5.10?


> and I wonder
> the problem mentioned above actually exists on recent kernels.
>
> Specifically, if the FE is closed, I don't get how the BE is not closed
> as well. And if this problem is found on a recent kernel, then it should
> be seen in the AVS driver as well, no?

It is totally conceivable (if not very likely) that the mainline
advancements in the sound subsystem make this patch obsolete.

I would be happy if that's the final outcome of our discussion
(since this will allow dropping the patch in our downstream kernel).

Best Regards,
Eugeniu

2022-09-28 09:34:48

by Pierre-Louis Bossart

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime



On 9/27/22 14:30, Eugeniu Rosca wrote:
> Hi Pierre,
>
> On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
>> On 9/26/22 18:35, Eugeniu Rosca wrote:
>>> From: xiao jin <[email protected]>
>>>
>>> After start of fe and be, fe might go to close without triggering
>>> STOP, and substream->runtime is freed. However, be is still at
>>> START state and its substream->runtime still points to the
>>> freed runtime.
>>>
>>> Later on, FE is opened/started again, and triggers STOP.
>>> snd_pcm_do_stop => dpcm_fe_dai_trigger
>>> => dpcm_fe_dai_do_trigger
>>> => dpcm_be_dai_trigger
>>> => dpcm_do_trigger
>>> => soc_pcm_trigger
>>> => skl_platform_pcm_trigger
>>> skl_platform_pcm_trigger accesses the freed old runtime data and
>>> kernel panic.
>>>
>>> The patch fixes it by assigning be_substream->runtime in
>>> dpcm_be_dai_startup when be's state is START.
>>
>> Can I ask on which kernel this patch was validated and on what platform?
>
> As shared with Czarek in https://lore.kernel.org/alsa-devel/20220927110022.GA3802@lxhi-065/ ,
> this patch was originally extracted from the out-of-tree Intel Apollo
> Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.
>
> No re-testing/re-validation has been performed on the latest vanilla.

There's no way to predict how a patch for a kernel 4.1 - released 7
years ago - would behave with a new kernel. If it's not tested it cannot
be merged.

> One of the goals behind submitting the patch is getting in touch
> with the original authors, as well as the members of alsa-devel,
> to assess if the patch is still relevant.

The only thing we could do is have more clarity on the test case and try
to reproduce it.

>>
>> We've done a lot of work since last year on DPCM states,
>
> Could you please feedback if the work on the DPCM states is
> pre- or post-v5.10?

It doesn't matter for this discussion on the upstream kernel. But yes
it's post v5.10.

>
>> and I wonder
>> the problem mentioned above actually exists on recent kernels.
>>
>> Specifically, if the FE is closed, I don't get how the BE is not closed
>> as well. And if this problem is found on a recent kernel, then it should
>> be seen in the AVS driver as well, no?
>
> It is totally conceivable (if not very likely) that the mainline
> advancements in the sound subsystem make this patch obsolete.
>
> I would be happy if that's the final outcome of our discussion
> (since this will allow dropping the patch in our downstream kernel).
>
> Best Regards,
> Eugeniu

2022-09-29 17:21:42

by Eugeniu Rosca

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime

Hi Pierre,

On Mi, Sep 28, 2022 at 10:36:32 +0200, Pierre-Louis Bossart wrote:
>
>
> On 9/27/22 14:30, Eugeniu Rosca wrote:
> > Hi Pierre,
> >
> > On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
> >> On 9/26/22 18:35, Eugeniu Rosca wrote:
> >>> From: xiao jin <[email protected]>
> >>>
> >>> After start of fe and be, fe might go to close without triggering
> >>> STOP, and substream->runtime is freed. However, be is still at
> >>> START state and its substream->runtime still points to the
> >>> freed runtime.
> >>>
> >>> Later on, FE is opened/started again, and triggers STOP.
> >>> snd_pcm_do_stop => dpcm_fe_dai_trigger
> >>> => dpcm_fe_dai_do_trigger
> >>> => dpcm_be_dai_trigger
> >>> => dpcm_do_trigger
> >>> => soc_pcm_trigger
> >>> => skl_platform_pcm_trigger
> >>> skl_platform_pcm_trigger accesses the freed old runtime data and
> >>> kernel panic.
> >>>
> >>> The patch fixes it by assigning be_substream->runtime in
> >>> dpcm_be_dai_startup when be's state is START.
> >>
> >> Can I ask on which kernel this patch was validated and on what platform?
> >
> > As shared with Czarek,
> > this patch was originally extracted from the out-of-tree Intel Apollo
> > Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.
> >
> > No re-testing/re-validation has been performed on the latest vanilla.
>
> There's no way to predict how a patch for a kernel 4.1 - released 7
> years ago - would behave with a new kernel. If it's not tested it cannot
> be merged.

No disagreement here :)

>
> > One of the goals behind submitting the patch is getting in touch
> > with the original authors, as well as the members of alsa-devel,
> > to assess if the patch is still relevant.
>
> The only thing we could do is have more clarity on the test case and try
> to reproduce it.

Agreed. As soon as a test-case pops up where the patch makes a
difference in the runtime behavior, you will hear back from us.

>
> >>
> >> We've done a lot of work since last year on DPCM states,
> >
> > Could you please feedback if the work on the DPCM states is
> > pre- or post-v5.10?
>
> It doesn't matter for this discussion on the upstream kernel. But yes
> it's post v5.10.

Thanks. This is helpful in the downstream context.

>
> >
> >> and I wonder
> >> the problem mentioned above actually exists on recent kernels.
> >>
> >> Specifically, if the FE is closed, I don't get how the BE is not closed
> >> as well. And if this problem is found on a recent kernel, then it should
> >> be seen in the AVS driver as well, no?
> >
> > It is totally conceivable (if not very likely) that the mainline
> > advancements in the sound subsystem make this patch obsolete.
> >
> > I would be happy if that's the final outcome of our discussion
> > (since this will allow dropping the patch in our downstream kernel).

Best Regards
Eugeniu