2022-09-27 07:57:48

by Cezary Rojewski

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime

On 2022-09-26 6:35 PM, Eugeniu Rosca wrote:
> From: xiao jin <[email protected]>
>
> After start of fe and be, fe might go to close without triggering
> STOP, and substream->runtime is freed. However, be is still at
> START state and its substream->runtime still points to the
> freed runtime.
>
> Later on, FE is opened/started again, and triggers STOP.
> snd_pcm_do_stop => dpcm_fe_dai_trigger
> => dpcm_fe_dai_do_trigger
> => dpcm_be_dai_trigger
> => dpcm_do_trigger
> => soc_pcm_trigger
> => skl_platform_pcm_trigger
> skl_platform_pcm_trigger accesses the freed old runtime data and
> kernel panic.
>
> The patch fixes it by assigning be_substream->runtime in
> dpcm_be_dai_startup when be's state is START.
>
> Signed-off-by: xiao jin <[email protected]>
> Signed-off-by: Zhang Yanmin <[email protected]>
> Signed-off-by: Eugeniu Rosca <[email protected]>


Hello,

The change seems to be driven by the skylake-driver problem. With all
due respect, why not ping owners of the driver first? There are some
crucial CCs missing.

I'd like to know more about the scenario you guys reproduced the problem
in. Configuration details and kernel base would be good to know too.
Since our CI did not detect problem of such sort, if the problem
actually exists, we would like to append a test or two to cover it later on.


Regards,
Czarek


2022-09-27 11:22:30

by Eugeniu Rosca

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime

Hello Czarek,

On Di, Sep 27, 2022 at 09:50:05 +0200, Cezary Rojewski wrote:
> On 2022-09-26 6:35 PM, Eugeniu Rosca wrote:
> >From: xiao jin <[email protected]>
> >
> >After start of fe and be, fe might go to close without triggering
> >STOP, and substream->runtime is freed. However, be is still at
> >START state and its substream->runtime still points to the
> >freed runtime.
> >
> >Later on, FE is opened/started again, and triggers STOP.
> >snd_pcm_do_stop => dpcm_fe_dai_trigger
> > => dpcm_fe_dai_do_trigger
> > => dpcm_be_dai_trigger
> > => dpcm_do_trigger
> > => soc_pcm_trigger
> > => skl_platform_pcm_trigger
> >skl_platform_pcm_trigger accesses the freed old runtime data and
> >kernel panic.
> >
> >The patch fixes it by assigning be_substream->runtime in
> >dpcm_be_dai_startup when be's state is START.
> >
> >Signed-off-by: xiao jin <[email protected]>
> >Signed-off-by: Zhang Yanmin <[email protected]>
> >Signed-off-by: Eugeniu Rosca <[email protected]>
>
> Hello,
>
> The change seems to be driven by the skylake-driver problem.

Agreed, based on the author/co-signer's e-mail and the call stack.

> With all due
> respect, why not ping owners of the driver first? There are some crucial CCs
> missing.

Some feedback already provided by Pierre-Louis Bossart (many thanks).
Cc-ing more Intel contributors in the sound subsystem.

>
> I'd like to know more about the scenario you guys reproduced the problem in.

This patch was originally identified in the Intel Apollo Lake v4.1 KNLs.
Given that the change itself is in the core sound subsystem, our internal
assessment was that the patch might potentially be relevant/helpful
on other HW platforms.

Our intention is to confirm or invalidate this assumption with the
original developers of the patch, as well as with the audio maintainers
and the members of the alsa-devel ML.

> Configuration details and kernel base would be good to know too. Since our
> CI did not detect problem of such sort, if the problem actually exists, we
> would like to append a test or two to cover it later on.

If there is no evidence that the patch is fixing a real-life issue
occurring in the latest vanilla, I agree to drop the patch.

So far, I do not possess this evidence myself.

> Regards,
> Czarek

Best regards,
Eugeniu

2022-09-28 14:52:44

by Cezary Rojewski

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime

On 2022-09-27 1:00 PM, Eugeniu Rosca wrote:
> Hello Czarek,

...

>> I'd like to know more about the scenario you guys reproduced the problem in.
>
> This patch was originally identified in the Intel Apollo Lake v4.1 KNLs.
> Given that the change itself is in the core sound subsystem, our internal
> assessment was that the patch might potentially be relevant/helpful
> on other HW platforms.
>
> Our intention is to confirm or invalidate this assumption with the
> original developers of the patch, as well as with the audio maintainers
> and the members of the alsa-devel ML.
>
>> Configuration details and kernel base would be good to know too. Since our
>> CI did not detect problem of such sort, if the problem actually exists, we
>> would like to append a test or two to cover it later on.
>
> If there is no evidence that the patch is fixing a real-life issue
> occurring in the latest vanilla, I agree to drop the patch.
>
> So far, I do not possess this evidence myself.


I've spent some time to locate the change. Found it and it seems
obsolete. Some tags are missing in the revision of yours and the
original date does not match either - it's Apr 2018 for the original.
Won't be mentioning the tags as some engineers no longer bear @intel.com.

soc-pcm and skylake-driver valuable bits from those trees are already
part of the upstream. Most of what is left was later proven obsolete or
redundant by my or Pierre's engineers. There seems to be no patch
missing except for few fixes from the recent SKL/KBL up-revs for our
clients. Nothing APL specific.

Following kernels related to APL are maintained by the IPG team from
software perspective:
4.1.42, 4.1.49, 4.4, 4.9, 4.14, 4.19

Multiple OSes. And then there are flavors for kernels/OS both. It's
quite likely kernel base of yours fits into one of these buckets or at
least have had changes ported from one of them.

TLDR: I agree here with my colleagues - if you believe the change is
necessary, a proof e.g.: in form of reproduction steps, is needed.
Otherwise it's no-go. Happy to hop on a call should you need any
additional information.


Regards,
Czarek

2022-09-29 17:21:22

by Eugeniu Rosca

[permalink] [raw]
Subject: Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime

Hello Czarek,

Thank you for your friendly feedback.

On Mi, Sep 28, 2022 at 04:24:43 +0200, Cezary Rojewski wrote:
> On 2022-09-27 1:00 PM, Eugeniu Rosca wrote:
> >Hello Czarek,
>
> ...
>
> >>I'd like to know more about the scenario you guys reproduced the problem in.
> >
> >This patch was originally identified in the Intel Apollo Lake v4.1 KNLs.
> >Given that the change itself is in the core sound subsystem, our internal
> >assessment was that the patch might potentially be relevant/helpful
> >on other HW platforms.
> >
> >Our intention is to confirm or invalidate this assumption with the
> >original developers of the patch, as well as with the audio maintainers
> >and the members of the alsa-devel ML.
> >
> >>Configuration details and kernel base would be good to know too. Since our
> >>CI did not detect problem of such sort, if the problem actually exists, we
> >>would like to append a test or two to cover it later on.
> >
> >If there is no evidence that the patch is fixing a real-life issue
> >occurring in the latest vanilla, I agree to drop the patch.
> >
> >So far, I do not possess this evidence myself.
>
>
> I've spent some time to locate the change. Found it and it seems obsolete.
> Some tags are missing in the revision of yours and the original date does
> not match either - it's Apr 2018 for the original. Won't be mentioning the
> tags as some engineers no longer bear @intel.com.
>
> soc-pcm and skylake-driver valuable bits from those trees are already part
> of the upstream. Most of what is left was later proven obsolete or redundant
> by my or Pierre's engineers. There seems to be no patch missing except for
> few fixes from the recent SKL/KBL up-revs for our clients. Nothing APL
> specific.

Thanks for this thorough check. That also gives us enough confidence
to drop the patch in some of our downstream kernels.

>
> Following kernels related to APL are maintained by the IPG team from
> software perspective:
> 4.1.42, 4.1.49, 4.4, 4.9, 4.14, 4.19
>
> Multiple OSes. And then there are flavors for kernels/OS both. It's quite
> likely kernel base of yours fits into one of these buckets or at least have
> had changes ported from one of them.

Good to know and yes, you are right w.r.t. the origin of the patch.

>
> TLDR: I agree here with my colleagues - if you believe the change is
> necessary, a proof e.g.: in form of reproduction steps, is needed. Otherwise
> it's no-go. Happy to hop on a call should you need any additional
> information.

That's a very kind attitude and we will definitely share any empirical
evidence if it turns out the patch is really contributing with healing
of any future runtime issues.

>
> Regards,
> Czarek

Best Regards
Eugeniu