LinuxLists.cc - [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-04-26 16:04:07

Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

Hi all,

Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm /
BACO. You can tell visually when it sleeps, because the fan on the
graphics card is switched off to save power. It did not spin down the
fan in v5.6.x.

This is great (I love it), except that when it is sleeping, the PCIe
audio function of the GPU has issues if anything tries to access it. You
get dmesg errors such as these:

snd_hda_intel 0000:08:00.1: spurious response 0x0:0x0, last cmd=0x170500
snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x001f0500
snd_hda_intel 0000:08:00.1: No response from codec, disabling MSI: last cmd=0x001f0500
snd_hda_intel 0000:08:00.1: No response from codec, resetting bus: last cmd=0x001f0500
snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11

The above is with the Fiji XT GPU at 0000:08:00.0 in a Thunderbolt
enclosure (not that Thunderbolt should affect it, but I feel I should
mention it just in case). I dropped a lot of duplicate dmesg lines, as
some of them repeated a lot of times before the driver gave up.

I offer this patch to disable runpm for Fiji while a fix is found, if
you decide that is the best approach. Regardless, I will gladly test any
patches you come up with instead and confirm that the above issue has
been fixed.

I cannot tell if any other GPUs are affected. The only other cards to
which I have access are a couple of AMD R9 280X (Tahiti XT), which use
radeon driver instead of amdgpu driver.

Kind regards,
Nicholas Johnson

Nicholas Johnson (1):
drm/amdgpu/runpm: Disable runpm on Fiji due to audio register timeout

drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 +
1 file changed, 1 insertion(+)

--
2.26.2

2020-04-26 16:04:54

by Nicholas Johnson

[permalink] [raw]

Subject: [PATCH 1/1] drm/amdgpu/runpm: Disable runpm on Fiji due to audio register timeout

Since commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable runpm on baco
capable VI+ asics"), runpm has been enabled on AMD Fiji GPUs. This
allows the GPU to enter BACO state, as evidenced by the fan on the
graphics card turning off. When it is in this state, accesses to the
registers of the PCIe audio function on the GPU time out, leading to
dmesg errors such as the following:

snd_hda_intel 0000:08:00.1: spurious response 0x0:0x0, last cmd=0x170500
snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x001f0500
snd_hda_intel 0000:08:00.1: No response from codec, disabling MSI: last cmd=0x001f0500
snd_hda_intel 0000:08:00.1: No response from codec, resetting bus: last cmd=0x001f0500
snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11

Pending a fix for the above problem, disable runpm on Fiji.

Signed-off-by: Nicholas Johnson <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index fd1dc3236..cbb55d2f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -172,6 +172,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
else if (amdgpu_device_supports_baco(dev) &&
(amdgpu_runtime_pm != 0) &&
(adev->asic_type >= CHIP_TOPAZ) &&
+ (adev->asic_type != CHIP_FIJI) &&
(adev->asic_type != CHIP_VEGA10) &&
(adev->asic_type != CHIP_VEGA20) &&
(adev->asic_type != CHIP_ARCTURUS)) /* enable runpm on VI+ */
--
2.26.2

2020-04-27 14:24:28

by Deucher, Alexander

[permalink] [raw]

Subject: RE: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

[AMD Public Use]

> -----Original Message-----
> From: Nicholas Johnson <[email protected]>
> Sent: Sunday, April 26, 2020 12:02 PM
> To: [email protected]
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>; Zhou, David(ChunMing)
> <[email protected]>; Nicholas Johnson <nicholas.johnson-
> [email protected]>
> Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
>
> Hi all,
>
> Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
> runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm /
> BACO. You can tell visually when it sleeps, because the fan on the graphics
> card is switched off to save power. It did not spin down the fan in v5.6.x.
>
> This is great (I love it), except that when it is sleeping, the PCIe audio function
> of the GPU has issues if anything tries to access it. You get dmesg errors such
> as these:
>
> snd_hda_intel 0000:08:00.1: spurious response 0x0:0x0, last cmd=0x170500
> snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling
> mode: last cmd=0x001f0500 snd_hda_intel 0000:08:00.1: No response from
> codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel 0000:08:00.1: No
> response from codec, resetting bus: last cmd=0x001f0500
> snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11
>
> The above is with the Fiji XT GPU at 0000:08:00.0 in a Thunderbolt enclosure
> (not that Thunderbolt should affect it, but I feel I should mention it just in
> case). I dropped a lot of duplicate dmesg lines, as some of them repeated a
> lot of times before the driver gave up.
>
> I offer this patch to disable runpm for Fiji while a fix is found, if you decide
> that is the best approach. Regardless, I will gladly test any patches you come
> up with instead and confirm that the above issue has been fixed.
>
> I cannot tell if any other GPUs are affected. The only other cards to which I
> have access are a couple of AMD R9 280X (Tahiti XT), which use radeon driver
> instead of amdgpu driver.

Adding a few more people. Do you know what is accessing the audio? The audio should have a dependency on the GPU device. The GPU won't enter runtime pm until the audio has entered runtime pm and vice versa on resume. Please attach a copy of your dmesg output and lspci output.

Alex

2020-04-27 15:20:09

by Takashi Iwai

[permalink] [raw]

Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

On Mon, 27 Apr 2020 16:22:21 +0200,
Deucher, Alexander wrote:
>
> [AMD Public Use]
>
> > -----Original Message-----
> > From: Nicholas Johnson <[email protected]>
> > Sent: Sunday, April 26, 2020 12:02 PM
> > To: [email protected]
> > Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> > <[email protected]>; Zhou, David(ChunMing)
> > <[email protected]>; Nicholas Johnson <nicholas.johnson-
> > [email protected]>
> > Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state
> >
> > Hi all,
> >
> > Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable
> > runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm /
> > BACO. You can tell visually when it sleeps, because the fan on the graphics
> > card is switched off to save power. It did not spin down the fan in v5.6.x.
> >
> > This is great (I love it), except that when it is sleeping, the PCIe audio function
> > of the GPU has issues if anything tries to access it. You get dmesg errors such
> > as these:
> >
> > snd_hda_intel 0000:08:00.1: spurious response 0x0:0x0, last cmd=0x170500
> > snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling
> > mode: last cmd=0x001f0500 snd_hda_intel 0000:08:00.1: No response from
> > codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel 0000:08:00.1: No
> > response from codec, resetting bus: last cmd=0x001f0500
> > snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11
> >
> > The above is with the Fiji XT GPU at 0000:08:00.0 in a Thunderbolt enclosure
> > (not that Thunderbolt should affect it, but I feel I should mention it just in
> > case). I dropped a lot of duplicate dmesg lines, as some of them repeated a
> > lot of times before the driver gave up.
> >
> > I offer this patch to disable runpm for Fiji while a fix is found, if you decide
> > that is the best approach. Regardless, I will gladly test any patches you come
> > up with instead and confirm that the above issue has been fixed.
> >
> > I cannot tell if any other GPUs are affected. The only other cards to which I
> > have access are a couple of AMD R9 280X (Tahiti XT), which use radeon driver
> > instead of amdgpu driver.
>
> Adding a few more people. Do you know what is accessing the audio? The audio should have a dependency on the GPU device. The GPU won't enter runtime pm until the audio has entered runtime pm and vice versa on resume. Please attach a copy of your dmesg output and lspci output.

Also, please retest with the fresh 5.7-rc3. There was a known
regression regarding HD-audio PM in 5.7-rc1/rc2, and it's been fixed
there (commit 8d6762af302d).

thanks,

Takashi

2020-04-27 17:26:24

On Sat, 02 May 2020 09:27:31 +0200,
Takashi Iwai wrote:
>
> On Sat, 02 May 2020 09:17:28 +0200,
> Lukas Wunner wrote:
> >
> > On Sat, May 02, 2020 at 09:11:58AM +0200, Takashi Iwai wrote:
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > @@ -673,6 +673,12 @@ static int amdgpu_dm_audio_component_bind(struct device *kdev,
> > > struct amdgpu_device *adev = dev->dev_private;
> > > struct drm_audio_component *acomp = data;
> > >
> > > + if (!device_link_add(hda_kdev, kdev, DL_FLAG_STATELESS |
> > > + DL_FLAG_PM_RUNTIME)) {
> > > + DRM_ERROR("DM: cannot add device link to audio device\n");
> > > + return -ENOMEM;
> > > + }
> > > +
> >
> > Doesn't this duplicate drivers/pci/quirks.c:quirk_gpu_hda() ?
>
> Gah, you're right, that was the place I overlooked.
> It was a typical "false Eureka right-after-wakeup" phenomenon :)
> Need a vaccine aka coffee...
>
> So the runtime PM dependency must be already placed there, and the
> problem is not the lack of the dependency tree but the really other
> timing issue. Back to square.

One interesting test is to open the stream while the mode isn't set
yet and see whether the same problem appears.
Namely, after the monitor is connected but no mode is set, run
directly like
aplay -Dhdmi:1,0 foo.wav
You might need to wrap the command with pasuspender if PA is active.

Takashi

2020-05-06 15:21:00

by Nicholas Johnson

[permalink] [raw]

Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

On Sat, May 02, 2020 at 12:09:13PM +0200, Takashi Iwai wrote:
> On Sat, 02 May 2020 09:27:31 +0200,
> Takashi Iwai wrote:
> >
> > On Sat, 02 May 2020 09:17:28 +0200,
> > Lukas Wunner wrote:
> > >
> > > On Sat, May 02, 2020 at 09:11:58AM +0200, Takashi Iwai wrote:
> > > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > > @@ -673,6 +673,12 @@ static int amdgpu_dm_audio_component_bind(struct device *kdev,
> > > > struct amdgpu_device *adev = dev->dev_private;
> > > > struct drm_audio_component *acomp = data;
> > > >
> > > > + if (!device_link_add(hda_kdev, kdev, DL_FLAG_STATELESS |
> > > > + DL_FLAG_PM_RUNTIME)) {
> > > > + DRM_ERROR("DM: cannot add device link to audio device\n");
> > > > + return -ENOMEM;
> > > > + }
> > > > +
> > >
> > > Doesn't this duplicate drivers/pci/quirks.c:quirk_gpu_hda() ?
> >
> > Gah, you're right, that was the place I overlooked.
> > It was a typical "false Eureka right-after-wakeup" phenomenon :)
> > Need a vaccine aka coffee...
> >
> > So the runtime PM dependency must be already placed there, and the
> > problem is not the lack of the dependency tree but the really other
> > timing issue. Back to square.
>
> One interesting test is to open the stream while the mode isn't set
> yet and see whether the same problem appears.
> Namely, after the monitor is connected but no mode is set, run
> directly like
> aplay -Dhdmi:1,0 foo.wav
> You might need to wrap the command with pasuspender if PA is active.
I could not figure out how to get the interface for aplay set other than
not specifying it and having it find the default device (which can
change). I even used aplay -L and aplay -l to show devices. I could not
get it working.

Is there anything else I can try? I did not apply the last patch when it
was pointed out that it is already a quirk.

Regards,
Nicholas
>
>
> Takashi