2022-04-07 20:23:58

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH] drm/amdgpu: Ensure HDA function is suspended before ASIC reset

DP/HDMI audio on AMD PRO VII stops working after S3:
[ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
[ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
[ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
[ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
[ 150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
...
[ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535

The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the asic in
suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for
reset in S3 ") doesn't help, so the issue is something different.

Assuming that to make HDA resume to D0 fully realized, it needs to be
successfully put to D3 first. And this guesswork proves working, by
moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
function is in D3.

Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Kai-Heng Feng <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index bb1c025d90019..31f7229e7ea89 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2323,18 +2323,23 @@ static int amdgpu_pmops_suspend(struct device *dev)
{
struct drm_device *drm_dev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(drm_dev);
- int r;

if (amdgpu_acpi_is_s0ix_active(adev))
adev->in_s0ix = true;
else
adev->in_s3 = true;
- r = amdgpu_device_suspend(drm_dev, true);
- if (r)
- return r;
+ return amdgpu_device_suspend(drm_dev, true);
+}
+
+static int amdgpu_pmops_suspend_noirq(struct device *dev)
+{
+ struct drm_device *drm_dev = dev_get_drvdata(dev);
+ struct amdgpu_device *adev = drm_to_adev(drm_dev);
+
if (!adev->in_s0ix)
- r = amdgpu_asic_reset(adev);
- return r;
+ return amdgpu_asic_reset(adev);
+
+ return 0;
}

static int amdgpu_pmops_resume(struct device *dev)
@@ -2575,6 +2580,7 @@ static const struct dev_pm_ops amdgpu_pm_ops = {
.prepare = amdgpu_pmops_prepare,
.complete = amdgpu_pmops_complete,
.suspend = amdgpu_pmops_suspend,
+ .suspend_noirq = amdgpu_pmops_suspend_noirq,
.resume = amdgpu_pmops_resume,
.freeze = amdgpu_pmops_freeze,
.thaw = amdgpu_pmops_thaw,
--
2.34.1


2022-04-07 20:41:40

by Mario Limonciello

[permalink] [raw]
Subject: RE: [PATCH] drm/amdgpu: Ensure HDA function is suspended before ASIC reset

[Public]

> -----Original Message-----
> From: Alex Deucher <[email protected]>
> Sent: Thursday, April 7, 2022 12:08
> To: Kai-Heng Feng <[email protected]>
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>; Pan, Xinhui <[email protected]>; David
> Airlie <[email protected]>; Maling list - DRI developers <dri-
> [email protected]>; LKML <[email protected]>; amd-
> gfx list <[email protected]>; Chiu, Solomon
> <[email protected]>; Limonciello, Mario
> <[email protected]>; Quan, Evan <[email protected]>;
> Tuikov, Luben <[email protected]>
> Subject: Re: [PATCH] drm/amdgpu: Ensure HDA function is suspended
> before ASIC reset
>
> On Thu, Apr 7, 2022 at 8:21 AM Kai-Heng Feng
> <[email protected]> wrote:
> >
> > DP/HDMI audio on AMD PRO VII stops working after S3:
> > [ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
> > [ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
> > [ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
> > [ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state
> from D0 to D3hot
> > [ 150.003439] amdgpu 0000:63:00.0: refused to change power state from
> D0 to D3hot
> > ...
> > [ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP
> = 65535
>
> As an aside, shouldn't device links order this properly already? I
> thought that was the whole point of them. We have quirks in PCI
> quirks.c to create device links for all GPU integrated peripherals
> (audio, usb, ucsi).

The movement from D0->D3 only happens in noirq though in pci_pm_suspend_noirq.
So if device links only order within a suspend phase then resetting during that
phase means the future phase won't have ground to stand on.

IOW I think device links + this commit are needed to get the order right with a reset.

>
> Alex
>
> >
> > The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the
> asic in
> > suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for
> > reset in S3 ") doesn't help, so the issue is something different.
> >
> > Assuming that to make HDA resume to D0 fully realized, it needs to be
> > successfully put to D3 first. And this guesswork proves working, by
> > moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
> > function is in D3.
> >
> > Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend
> (v2)")
> > Signed-off-by: Kai-Heng Feng <[email protected]>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 18 ++++++++++++------
> > 1 file changed, 12 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index bb1c025d90019..31f7229e7ea89 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2323,18 +2323,23 @@ static int amdgpu_pmops_suspend(struct
> device *dev)
> > {
> > struct drm_device *drm_dev = dev_get_drvdata(dev);
> > struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > - int r;
> >
> > if (amdgpu_acpi_is_s0ix_active(adev))
> > adev->in_s0ix = true;
> > else
> > adev->in_s3 = true;
> > - r = amdgpu_device_suspend(drm_dev, true);
> > - if (r)
> > - return r;
> > + return amdgpu_device_suspend(drm_dev, true);
> > +}
> > +
> > +static int amdgpu_pmops_suspend_noirq(struct device *dev)
> > +{
> > + struct drm_device *drm_dev = dev_get_drvdata(dev);
> > + struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > +
> > if (!adev->in_s0ix)
> > - r = amdgpu_asic_reset(adev);
> > - return r;
> > + return amdgpu_asic_reset(adev);
> > +
> > + return 0;
> > }
> >
> > static int amdgpu_pmops_resume(struct device *dev)
> > @@ -2575,6 +2580,7 @@ static const struct dev_pm_ops amdgpu_pm_ops
> = {
> > .prepare = amdgpu_pmops_prepare,
> > .complete = amdgpu_pmops_complete,
> > .suspend = amdgpu_pmops_suspend,
> > + .suspend_noirq = amdgpu_pmops_suspend_noirq,
> > .resume = amdgpu_pmops_resume,
> > .freeze = amdgpu_pmops_freeze,
> > .thaw = amdgpu_pmops_thaw,
> > --
> > 2.34.1
> >

2022-04-07 20:57:12

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: Ensure HDA function is suspended before ASIC reset

Applied. Thanks!

Alex

On Thu, Apr 7, 2022 at 8:21 AM Kai-Heng Feng
<[email protected]> wrote:
>
> DP/HDMI audio on AMD PRO VII stops working after S3:
> [ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
> [ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
> [ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
> [ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
> [ 150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
> ...
> [ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535
>
> The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the asic in
> suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for
> reset in S3 ") doesn't help, so the issue is something different.
>
> Assuming that to make HDA resume to D0 fully realized, it needs to be
> successfully put to D3 first. And this guesswork proves working, by
> moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
> function is in D3.
>
> Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
> Signed-off-by: Kai-Heng Feng <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index bb1c025d90019..31f7229e7ea89 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2323,18 +2323,23 @@ static int amdgpu_pmops_suspend(struct device *dev)
> {
> struct drm_device *drm_dev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = drm_to_adev(drm_dev);
> - int r;
>
> if (amdgpu_acpi_is_s0ix_active(adev))
> adev->in_s0ix = true;
> else
> adev->in_s3 = true;
> - r = amdgpu_device_suspend(drm_dev, true);
> - if (r)
> - return r;
> + return amdgpu_device_suspend(drm_dev, true);
> +}
> +
> +static int amdgpu_pmops_suspend_noirq(struct device *dev)
> +{
> + struct drm_device *drm_dev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
> +
> if (!adev->in_s0ix)
> - r = amdgpu_asic_reset(adev);
> - return r;
> + return amdgpu_asic_reset(adev);
> +
> + return 0;
> }
>
> static int amdgpu_pmops_resume(struct device *dev)
> @@ -2575,6 +2580,7 @@ static const struct dev_pm_ops amdgpu_pm_ops = {
> .prepare = amdgpu_pmops_prepare,
> .complete = amdgpu_pmops_complete,
> .suspend = amdgpu_pmops_suspend,
> + .suspend_noirq = amdgpu_pmops_suspend_noirq,
> .resume = amdgpu_pmops_resume,
> .freeze = amdgpu_pmops_freeze,
> .thaw = amdgpu_pmops_thaw,
> --
> 2.34.1
>

2022-04-07 21:21:17

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: Ensure HDA function is suspended before ASIC reset

On Thu, Apr 7, 2022 at 8:21 AM Kai-Heng Feng
<[email protected]> wrote:
>
> DP/HDMI audio on AMD PRO VII stops working after S3:
> [ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
> [ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
> [ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
> [ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
> [ 150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
> ...
> [ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535

As an aside, shouldn't device links order this properly already? I
thought that was the whole point of them. We have quirks in PCI
quirks.c to create device links for all GPU integrated peripherals
(audio, usb, ucsi).

Alex

>
> The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the asic in
> suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for
> reset in S3 ") doesn't help, so the issue is something different.
>
> Assuming that to make HDA resume to D0 fully realized, it needs to be
> successfully put to D3 first. And this guesswork proves working, by
> moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
> function is in D3.
>
> Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
> Signed-off-by: Kai-Heng Feng <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index bb1c025d90019..31f7229e7ea89 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2323,18 +2323,23 @@ static int amdgpu_pmops_suspend(struct device *dev)
> {
> struct drm_device *drm_dev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = drm_to_adev(drm_dev);
> - int r;
>
> if (amdgpu_acpi_is_s0ix_active(adev))
> adev->in_s0ix = true;
> else
> adev->in_s3 = true;
> - r = amdgpu_device_suspend(drm_dev, true);
> - if (r)
> - return r;
> + return amdgpu_device_suspend(drm_dev, true);
> +}
> +
> +static int amdgpu_pmops_suspend_noirq(struct device *dev)
> +{
> + struct drm_device *drm_dev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
> +
> if (!adev->in_s0ix)
> - r = amdgpu_asic_reset(adev);
> - return r;
> + return amdgpu_asic_reset(adev);
> +
> + return 0;
> }
>
> static int amdgpu_pmops_resume(struct device *dev)
> @@ -2575,6 +2580,7 @@ static const struct dev_pm_ops amdgpu_pm_ops = {
> .prepare = amdgpu_pmops_prepare,
> .complete = amdgpu_pmops_complete,
> .suspend = amdgpu_pmops_suspend,
> + .suspend_noirq = amdgpu_pmops_suspend_noirq,
> .resume = amdgpu_pmops_resume,
> .freeze = amdgpu_pmops_freeze,
> .thaw = amdgpu_pmops_thaw,
> --
> 2.34.1
>