Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp773236ybz; Wed, 29 Apr 2020 09:08:09 -0700 (PDT) X-Google-Smtp-Source: APiQypLuFFJyIsEGGnVT/HTU90kbGWHGHElxVtpFXNBBEanXyNMFjylCX4nqKhk6gOvbCnrdiHFu X-Received: by 2002:a05:6402:543:: with SMTP id i3mr2952797edx.255.1588176489492; Wed, 29 Apr 2020 09:08:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588176489; cv=none; d=google.com; s=arc-20160816; b=E9L0Th+Fq+kUuAIj1R+E8Z8EMPjtckVEYNWog/6v8oGLV8lw/YTsRRB+Ap6JnFHZfy 8rO8fURExept5DTeXglecbZfLZD0oEqbXrbqhKKt9LVNTdf9uKJ5Tg5XhikSqu2FfKPo 88a54pF2ZSvmoZIReYh+nqRxT0KFOu+74b9a0nmvpi527kG0HhqHwTtIPpFwYv0Va6us 6Y3vEeSzp4RK4ZJmhXpsqD78fwNLRKJHCYijI1s6lHKVmEkgri8wt76Bne0fc+0mN8es kfiwvN2/j/Uq3w05JZjK1a4FBu9V+BZ9eCwXS2Ze8lF15MYXi5ipXGv3zbT7Cch3T3zQ dR2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:subject:cc:to:from:message-id:date; bh=QjiGAt+UgiJkK0ogvYNvWWsL2nG3CVXnVXvuW2TfRzA=; b=VvOzrq/PCnYU35N3UwNFDe8/JloVKwAuX3DkFhR9I/KDlUxJrnRQuM0JSfbKfVHHfa BRjXPG/jfXBEnh6RPbvpi7li/9E2tdZTNzvQAzjDYBeExBAXYGCN7yV3s5JikjgP24PL j+wVw0yDpzqUAsootAysoIR412aHlt8OR7ER3u7E5pGY2e8+Nghk7Taa6m3a5I7Dqu4M O3S+hzQ6djcS7oaheGVDD4UIs+/cBXgFh9KltwaDAW9OfNj0XvhgEtojijEBTfnoOH2t 7tgz4B1FUEoCwhzRWGcvWgKmTXQBFDpztlD+rYzGonxTz0aW9Yt9HfMq4G2tlgsK/5kw dK2Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k18si3641068edx.296.2020.04.29.09.07.22; Wed, 29 Apr 2020 09:08:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726774AbgD2QFH (ORCPT + 99 others); Wed, 29 Apr 2020 12:05:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:35326 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726476AbgD2QFH (ORCPT ); Wed, 29 Apr 2020 12:05:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 6C07DAB91; Wed, 29 Apr 2020 16:05:04 +0000 (UTC) Date: Wed, 29 Apr 2020 18:05:04 +0200 Message-ID: From: Takashi Iwai To: Alex Deucher Cc: Nicholas Johnson , "Zhou, David(ChunMing)" , "alsa-devel@alsa-project.org" , "linux-kernel@vger.kernel.org" , "amd-gfx@lists.freedesktop.org" , Takashi Iwai , Lukas Wunner , "Deucher, Alexander" , "Koenig, Christian" Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state In-Reply-To: References: User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 29 Apr 2020 17:47:47 +0200, Alex Deucher wrote: > > On Wed, Apr 29, 2020 at 11:27 AM Nicholas Johnson > wrote: > > > > On Wed, Apr 29, 2020 at 09:37:41AM +0200, Takashi Iwai wrote: > > > On Tue, 28 Apr 2020 16:48:45 +0200, > > > Nicholas Johnson wrote: > > > > > > > > > > > > > > > > > > > > FWIW, I have a fiji board in a desktop system and it worked fine when > > > > > > > > this code was enabled. > > > > > > > > > > > > > > Is the new DC code used for Fiji boards? IIRC, the audio component > > > > > > > binding from amdgpu is enabled only for DC, and without the audio > > > > > > > component binding the runtime PM won't be linked up, hence you can't > > > > > > > power up GPU from the audio side access automatically. > > > > > > > > > > > > > > > > > > > Yes, DC is enabled by default for all cards with runtime pm enabled. > > > > > > > > > > OK, thanks, I found that amdgpu got bound via component in the dmesg > > > > > output, too: > > > > > > > > > > [ 21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) > > > > > > > > > > This is the place soon after amdgpu driver gets initialized. > > > > > Then we see later another initialization phase: > > > > > > > > > > [ 26.904127] rfkill: input handler enabled > > > > > [ 37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000). > > > > > > > > > > here shows 10 seconds between them. Then, it complained something: > > > > > > > > > > > > > > > [ 37.363287] [drm] UVD initialized successfully. > > > > > [ 37.473340] [drm] VCE initialized successfully. > > > > > [ 37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes > > > > > > > > The above would be me hitting WindowsKey+P to change screens, but with > > > > no DisplayPort attached to Fiji, hence it unable to find crtc. > > > > > > > > > > > > > > ... and go further, and hitting HD-audio error: > > > > > > > > > That would be me having attached the DisplayPort and done WindowsKey+P > > > > again. > > > > > > > > > [ 38.936624] [drm] fb mappable at 0x4B0696000 > > > > > [ 38.936626] [drm] vram apper at 0x4B0000000 > > > > > [ 38.936626] [drm] size 33177600 > > > > > [ 38.936627] [drm] fb depth is 24 > > > > > [ 38.936627] [drm] pitch is 15360 > > > > > [ 38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device > > > > > [ 40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500 > > > > > > > > > > After this point, HD-audio communication was screwed up. > > > > > > > > > > This lastcmd in the above message is AC_SET_POWER_STATE verb for the > > > > > root node to D0, so the very first command to power up the codec. > > > > > The rest commands are also about the power up of each node, so the > > > > > whole error indicate that the power up at runtime resume failed. > > > > > > > > > > So, this looks to me as if the device gets runtime-resumed at the bad > > > > > moment? > > > > It does. However, this is not going to be easy to pin down. > > > > > > > > I moved from Arch to Ubuntu, and it behaves differently. I cannot > > > > trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if > > > > attached at boot, unlike Arch. I will continue to try to trigger it. But > > > > even if this is a problem with the Linux distribution, it should not be > > > > able to trigger a kernel mode bug, so we should persist with finding it. > > > > > > Sure, that's a bug to be fixed. > > > > > > This made me thinking what happens if we load the HD-audio driver very > > > late. Could you try to blacklist snd-hda-intel module, then load it > > > manually after plugging the DP monitor and activating it? > > Attached dmesg-blacklist-* > > > > It is interesting. If I enable the monitor with the module unloaded, and > > then load the module, I cannot trigger the bug, even if disabling the > > monitor, waiting for GPU to sleep, and then waking again. > > > > Even if I wake monitor up, put to sleep again, and then insmod when > > sleeping, it does not cause bug when waking again. > > > > Is there anything special about the first time the monitor is used? > > > > What do you mean by used? Do you mean plugged in to the GPU or used > in the GUI? It might be easier to debug this without a GUI involved. > Can you try this at runlevel 3 or something equivalent for your > distro? > > When the GPU is powered up, the driver gets an interrupt when a > display is hotplugged and generates an event and userspace > applications can listen for these events. When the GPU is powered > down, there's no interrupt. I think most GUIs poll GPUs periodically > to handle this case so they can detect a new display even when the GPU > is off. Maybe we are getting some sort of race here. GUI queries GPU > driver, causes GPU to wake up, checks attached displays, GPU driver > resets runtime pm timer. GPU goes back to sleep. The detection > updates the ELD data which causes the HDA driver to wake up. It > assumes the hw is on and tries to query it. In the meantime, the GPU > has already powered everything down again. Well, but the code path there is the runtime PM resume of the audio device and it means that GPU must have been runtime-resumed again beforehand via the device link. So, it should have worked from the beginning but in reality not -- that is, apparently some inconsistency is found in the initial attempt of the runtime resume... Takashi