Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp754089ybz; Wed, 29 Apr 2020 08:47:34 -0700 (PDT) X-Google-Smtp-Source: APiQypLYSevcvxPX0kBFrjkDtAFkuS6nb98oX0JhKsbX8VDTdzqwEiKfTkILr0+Pm3olSsnKdyxb X-Received: by 2002:a17:906:2993:: with SMTP id x19mr3310596eje.280.1588175254674; Wed, 29 Apr 2020 08:47:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588175254; cv=none; d=google.com; s=arc-20160816; b=SDDUUeIsZidpZWGDgaHC6ERWpFrlRJHFtbMaSSlRxHPsPVeEWcoGWu0LI38+9VvoMt K4y/CKLR8yezIiS+pEEq+huTr/crHqIUTAmu7Z8yOR2QLi+2M0lyFK3gcHobMfKUSq3a YRqg39NhaXh18D0nh2X79NEFWn67nEZ491beYjkUcCDwMFrfwdvlU3hrb0+Ra4uz4Aia 7hgruyOf/c1DuJ09VDP+mUSmfeKO55CQ2a4is9MQRP0pdXYlc9W022XtmlJKwOa0EV9q LpUEP6OCTNFN+qZTNR1zAuEA8uQEuwx6nL+iacghBLjDTUCoXwd88hGYOJWevxtcXMMf FUGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:subject:cc:to:from:message-id:date; bh=znlG9pkQF4yjb9rUKdKbvZwIe1nyivvVGsVCe7P2fvs=; b=sCnMIBZAYwLPhKZsJ9hPvV0qkGF7iM4y7z5W5czV/kHIPOoM4LBMj9Qn1UaFBSiEPA s+dFKlj69w70p+E8dUVvA6h12HSB1DxAjJy65p+ukIanaRX/omZHCs0lP958Cw1ouv1d uwI3amV8pzDv/NywdGL2rxAxl920MyBriJhPuKkc0VEEMQkTvcQDOZs7gf2PtQkDhTp3 Z+9C7/RvTT7WM6Cknw8FD8L8yW4HfnPfFR+T2ZlUvJEkBLQPcqKPPEYDIvsE6j7IS8uJ Uu06IePmVjng3tuU8GNkq9yYWmDuEbxlIxdtyNEo8X8uGbJYYANDWXOgzlhJ/Hyd315A 4U+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cy17si1323679edb.84.2020.04.29.08.47.09; Wed, 29 Apr 2020 08:47:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726596AbgD2Pni (ORCPT + 99 others); Wed, 29 Apr 2020 11:43:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:55138 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726423AbgD2Pni (ORCPT ); Wed, 29 Apr 2020 11:43:38 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 497D3AF21; Wed, 29 Apr 2020 15:43:35 +0000 (UTC) Date: Wed, 29 Apr 2020 17:43:35 +0200 Message-ID: From: Takashi Iwai To: Nicholas Johnson Cc: Alex Deucher , "Zhou, David(ChunMing)" , "alsa-devel@alsa-project.org" , "linux-kernel@vger.kernel.org" , "amd-gfx@lists.freedesktop.org" , Takashi Iwai , Lukas Wunner , "Deucher, Alexander" , "Koenig, Christian" Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state In-Reply-To: References: User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 29 Apr 2020 17:27:17 +0200, Nicholas Johnson wrote: > > On Wed, Apr 29, 2020 at 09:37:41AM +0200, Takashi Iwai wrote: > > On Tue, 28 Apr 2020 16:48:45 +0200, > > Nicholas Johnson wrote: > > > > > > > > > > > > > > > > > FWIW, I have a fiji board in a desktop system and it worked fine when > > > > > > > this code was enabled. > > > > > > > > > > > > Is the new DC code used for Fiji boards? IIRC, the audio component > > > > > > binding from amdgpu is enabled only for DC, and without the audio > > > > > > component binding the runtime PM won't be linked up, hence you can't > > > > > > power up GPU from the audio side access automatically. > > > > > > > > > > > > > > > > Yes, DC is enabled by default for all cards with runtime pm enabled. > > > > > > > > OK, thanks, I found that amdgpu got bound via component in the dmesg > > > > output, too: > > > > > > > > [ 21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) > > > > > > > > This is the place soon after amdgpu driver gets initialized. > > > > Then we see later another initialization phase: > > > > > > > > [ 26.904127] rfkill: input handler enabled > > > > [ 37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000). > > > > > > > > here shows 10 seconds between them. Then, it complained something: > > > > > > > > > > > > [ 37.363287] [drm] UVD initialized successfully. > > > > [ 37.473340] [drm] VCE initialized successfully. > > > > [ 37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes > > > > > > The above would be me hitting WindowsKey+P to change screens, but with > > > no DisplayPort attached to Fiji, hence it unable to find crtc. > > > > > > > > > > > ... and go further, and hitting HD-audio error: > > > > > > > That would be me having attached the DisplayPort and done WindowsKey+P > > > again. > > > > > > > [ 38.936624] [drm] fb mappable at 0x4B0696000 > > > > [ 38.936626] [drm] vram apper at 0x4B0000000 > > > > [ 38.936626] [drm] size 33177600 > > > > [ 38.936627] [drm] fb depth is 24 > > > > [ 38.936627] [drm] pitch is 15360 > > > > [ 38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device > > > > [ 40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500 > > > > > > > > After this point, HD-audio communication was screwed up. > > > > > > > > This lastcmd in the above message is AC_SET_POWER_STATE verb for the > > > > root node to D0, so the very first command to power up the codec. > > > > The rest commands are also about the power up of each node, so the > > > > whole error indicate that the power up at runtime resume failed. > > > > > > > > So, this looks to me as if the device gets runtime-resumed at the bad > > > > moment? > > > It does. However, this is not going to be easy to pin down. > > > > > > I moved from Arch to Ubuntu, and it behaves differently. I cannot > > > trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if > > > attached at boot, unlike Arch. I will continue to try to trigger it. But > > > even if this is a problem with the Linux distribution, it should not be > > > able to trigger a kernel mode bug, so we should persist with finding it. > > > > Sure, that's a bug to be fixed. > > > > This made me thinking what happens if we load the HD-audio driver very > > late. Could you try to blacklist snd-hda-intel module, then load it > > manually after plugging the DP monitor and activating it? > Attached dmesg-blacklist-* > > It is interesting. If I enable the monitor with the module unloaded, and > then load the module, I cannot trigger the bug, even if disabling the > monitor, waiting for GPU to sleep, and then waking again. > > Even if I wake monitor up, put to sleep again, and then insmod when > sleeping, it does not cause bug when waking again. Thanks, that's a good news, at least. > Is there anything special about the first time the monitor is used? My wild guess is that the audio controller got powered up too early before the graphics side became ready. Basically HD-audio PCI controller for HDMI audio is a shadow component of the graphics chip, so it can't work before GPU is set up properly. > > Also, could you track who called the problematic power-up sequence, > > e.g. by adding WARN_ON_ONCE()? > Attached dmesg-warning This showed that it's triggered by the runtime PM resume from opening the PCM device. That said, a desktop application (most likely PulseAudio) tried to open the stream because it detected something. This implies a doubt that PA received a false-positive notification about the HDMI audio connection, so... > > Last but not least, please check /proc/asound/card1/eld#* files (there > > are both card0 and card1 or such that contain eld#* files, and one is > > for i915 and another for amdgpu) before and after plugging. This > > shows whether the audio connection was recognized or not. > Before plugging: card not yet attached, so the sysfs for that card not > yet created > > After plugging (and insmod snd-hda-intel.ko): > codec#0 codec#2 eld#2.0 eld#2.1 eld#2.2 eld#2.3 eld#2.4 eld#2.5 eld#2.6 eld#2.7 eld#2.8 id pcm0c pcm0p pcm10p pcm3p pcm7p pcm8p pcm9p ... here comes the question again. What's interesting here is the contents of eld#* proc files. If, at the moment the problem appears, any of eld#* files shows the state as connected wrongly, it may confuse the user-space to trigger the opening of PCM stream. Note that you should have multiple /proc/asound/card[0-9]* directories and one of them is for i915 and another for amdgpu. The interesting information is only about the latter. thanks, Takashi