Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp3832061ybz; Tue, 28 Apr 2020 00:59:22 -0700 (PDT) X-Google-Smtp-Source: APiQypIvBFe7pvffF5G8Znm7c9JllxkcfQXzn8+lprVPtH8J5Ecq195wMaw9p2KHgSBfh2e9mkbE X-Received: by 2002:a17:906:551:: with SMTP id k17mr23822028eja.350.1588060762557; Tue, 28 Apr 2020 00:59:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588060762; cv=none; d=google.com; s=arc-20160816; b=HfvRP1MB99AF5nues4RpL8BOm4TwVn7pFWm3mmmrvxMAeqj/ZwjFBnb34Jrdl6m/W0 RwwsmaybTZ7o8RelyIz6ZxB3A6wL7JlByaDgktiAjNpKaJeqD02NJT6pvP1B54NuIM33 StHTGj4AqBAraFBd/eODJ3FuzCrPiWpvB5YnVzspnRQfej+5FWtOjRi0eFfIyRr4Tw3m Ydoy2wPCE7hJV+aKAOiTPJ8giLhAnTVG1+cX8PsekTafSU5F+3ywMLKc6cyTpxKfjxKi b4GLEmccbkBqWTnZTNlrtfHJ1MMiQa5Kups1chFeH1zJPFIxj6WHlNJ+G1GtUyAV4OlV KRzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:subject:cc:to:from:message-id:date; bh=JKsqWU5o0+xQKhZWBufMc6m6HOmt2BDQMLXfj40i+7M=; b=mR3MouMVN9Ctsjjqi6Cq60gUfTs7ReNo5wP8A7D6YKtbX2YAOJRe0n4C5wjtfsGevx kW0JptN8vpfRnw0RfoUq1Dl87WOguKYkv+fcbdiDysYPs0CkmxotpYOvLV7NB5EBEwkI 5LQPq3Y6mBR7tbP4k0Dufd7BgDR1LrNvy/HiO3jMCWsGF3aHhnXOBN7MkermIX4NdSLB YH7eKxHR+vSJbbAJL/6ZUZoX9yfCe8jVcZjReNv+FKJiY+P43AfminsbAPorRwd63XmK gr9q0edWuiEcZKDKjnZuM3iwIIuY7+nn/ixCxAxoIAo5h6LWzefc+U99Ka1klUP9gZ3M KFRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p11si1400243ejj.491.2020.04.28.00.58.59; Tue, 28 Apr 2020 00:59:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726453AbgD1H52 (ORCPT + 99 others); Tue, 28 Apr 2020 03:57:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:43990 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726256AbgD1H52 (ORCPT ); Tue, 28 Apr 2020 03:57:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 775A0B16B; Tue, 28 Apr 2020 07:57:24 +0000 (UTC) Date: Tue, 28 Apr 2020 09:57:24 +0200 Message-ID: From: Takashi Iwai To: Alex Deucher Cc: Nicholas Johnson , "Zhou, David(ChunMing)" , "alsa-devel@alsa-project.org" , "linux-kernel@vger.kernel.org" , "amd-gfx@lists.freedesktop.org" , Takashi Iwai , Lukas Wunner , "Deucher, Alexander" , "Koenig, Christian" Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state In-Reply-To: References: User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 27 Apr 2020 20:43:54 +0200, Alex Deucher wrote: > > On Mon, Apr 27, 2020 at 2:39 PM Takashi Iwai wrote: > > > > On Mon, 27 Apr 2020 20:28:12 +0200, > > Alex Deucher wrote: > > > > > > On Mon, Apr 27, 2020 at 2:07 PM Nicholas Johnson > > > wrote: > > > > > > > > On Mon, Apr 27, 2020 at 05:15:55PM +0200, Takashi Iwai wrote: > > > > > On Mon, 27 Apr 2020 16:22:21 +0200, > > > > > Deucher, Alexander wrote: > > > > > > > > > > > > [AMD Public Use] > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Nicholas Johnson > > > > > > > Sent: Sunday, April 26, 2020 12:02 PM > > > > > > > To: linux-kernel@vger.kernel.org > > > > > > > Cc: Deucher, Alexander ; Koenig, Christian > > > > > > > ; Zhou, David(ChunMing) > > > > > > > ; Nicholas Johnson > > > > > > opensource@outlook.com.au> > > > > > > > Subject: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Since Linux v5.7-rc1 / commit 4fdda2e66de0 ("drm/amdgpu/runpm: enable > > > > > > > runpm on baco capable VI+ asics"), my AMD R9 Nano has been using runpm / > > > > > > > BACO. You can tell visually when it sleeps, because the fan on the graphics > > > > > > > card is switched off to save power. It did not spin down the fan in v5.6.x. > > > > > > > > > > > > > > This is great (I love it), except that when it is sleeping, the PCIe audio function > > > > > > > of the GPU has issues if anything tries to access it. You get dmesg errors such > > > > > > > as these: > > > > > > > > > > > > > > snd_hda_intel 0000:08:00.1: spurious response 0x0:0x0, last cmd=0x170500 > > > > > > > snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling > > > > > > > mode: last cmd=0x001f0500 snd_hda_intel 0000:08:00.1: No response from > > > > > > > codec, disabling MSI: last cmd=0x001f0500 snd_hda_intel 0000:08:00.1: No > > > > > > > response from codec, resetting bus: last cmd=0x001f0500 > > > > > > > snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x2f0d00. -11 > > > > > > > > > > > > > > The above is with the Fiji XT GPU at 0000:08:00.0 in a Thunderbolt enclosure > > > > > > > (not that Thunderbolt should affect it, but I feel I should mention it just in > > > > > > > case). I dropped a lot of duplicate dmesg lines, as some of them repeated a > > > > > > > lot of times before the driver gave up. > > > > > > > > > > > > > > I offer this patch to disable runpm for Fiji while a fix is found, if you decide > > > > > > > that is the best approach. Regardless, I will gladly test any patches you come > > > > > > > up with instead and confirm that the above issue has been fixed. > > > > > > > > > > > > > > I cannot tell if any other GPUs are affected. The only other cards to which I > > > > > > > have access are a couple of AMD R9 280X (Tahiti XT), which use radeon driver > > > > > > > instead of amdgpu driver. > > > > > > > > > > > > Adding a few more people. Do you know what is accessing the audio? The audio should have a dependency on the GPU device. The GPU won't enter runtime pm until the audio has entered runtime pm and vice versa on resume. Please attach a copy of your dmesg output and lspci output. > > > > > > > > pci 0000:08:00.1: D0 power state depends on 0000:08:00.0 > > > > The above must be the dependency of which you speak from dmesg. > > > > > > > > Accessing the audio? I did not have a single method for triggering it. > > > > Sometimes it happened on shutdown. Sometimes when restarting gdm. > > > > Sometimes when playing with audio settings in Cinnamon Desktop. But most > > > > often when changing displays. It might have something to do with the > > > > audio device associated with a monitor being created when the monitor is > > > > found. If an audio device is created, then pulseaudio might touch it. > > > > Sorry, this is a very verbose "not quite sure". > > > > > > > > To trigger the bug, this time I did the following: > > > > > > > > 1. Boot laptop without Fiji and log in > > > > > > > > 2. Attach Fiji via Thunderbolt (no displays attached to Fiji) and > > > > approve Thunderbolt device > > > > > > > > 3. Log in again because the session gets killed when GPU is hot-added > > > > > > > > 4. Wait for Fiji to fall asleep (fan stops) > > > > > > > > 5. Open "dmesg -w" on laptop display > > > > > > > > 6. Attach display to DisplayPort on Fiji (it should still stay asleep) > > > > > > > > 7. Do WindowsKey+P to activate external display. The error appears in > > > > dmesg window that instant. > > > > > > > > Could it be a race condition when waking the card up? > > > > > > > > I cannot get the graphics card fan to spin down if the Thunderbolt > > > > enclosure is attached at boot time. It only does it if hot-added. > > > > > > > > If you think it will help, I can take out the Fiji and put it in a test > > > > rig and try to replicate the issue without Thunderbolt, but it looks > > > > like it will not spin the fan down if Fiji is attached at boot time. > > > > > > > > Question, why would the fan not spin down if Fiji is attached at boot > > > > time, and how would one make the said fan turn off? Aside from being > > > > useful for pinning down the audio register issue, I would like to make > > > > sure the power savings are realised whenever the GPU is not being used. > > > > > > Presumably something is using the device. Maybe a framebuffer console > > > or X? Or maybe the something like tlp has disabled runtime pm on your > > > device? You can see the current status by reading the files in > > > /sys/class/drm/cardX/device/power/ . Replace cardX with card0, card1, > > > etc. depending on which device is the radeon card. > > > > > > FWIW, I have a fiji board in a desktop system and it worked fine when > > > this code was enabled. > > > > Is the new DC code used for Fiji boards? IIRC, the audio component > > binding from amdgpu is enabled only for DC, and without the audio > > component binding the runtime PM won't be linked up, hence you can't > > power up GPU from the audio side access automatically. > > > > Yes, DC is enabled by default for all cards with runtime pm enabled. OK, thanks, I found that amdgpu got bound via component in the dmesg output, too: [ 21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) This is the place soon after amdgpu driver gets initialized. Then we see later another initialization phase: [ 26.904127] rfkill: input handler enabled [ 37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000). here shows 10 seconds between them. Then, it complained something: [ 37.363287] [drm] UVD initialized successfully. [ 37.473340] [drm] VCE initialized successfully. [ 37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes ... and go further, and hitting HD-audio error: [ 38.936624] [drm] fb mappable at 0x4B0696000 [ 38.936626] [drm] vram apper at 0x4B0000000 [ 38.936626] [drm] size 33177600 [ 38.936627] [drm] fb depth is 24 [ 38.936627] [drm] pitch is 15360 [ 38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device [ 40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500 After this point, HD-audio communication was screwed up. This lastcmd in the above message is AC_SET_POWER_STATE verb for the root node to D0, so the very first command to power up the codec. The rest commands are also about the power up of each node, so the whole error indicate that the power up at runtime resume failed. So, this looks to me as if the device gets runtime-resumed at the bad moment? thanks, Takashi