Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp787788ybz; Wed, 29 Apr 2020 09:22:40 -0700 (PDT) X-Google-Smtp-Source: APiQypITpi/JyvfAxsftVLIOHyF7xr0KKgyP3k57o1Xn/biAlUmqvNdb/Nztgp0jbtGLrbH2GMVj X-Received: by 2002:a05:6402:2d5:: with SMTP id b21mr3080092edx.291.1588177359946; Wed, 29 Apr 2020 09:22:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588177359; cv=none; d=google.com; s=arc-20160816; b=C8+cpp3g4VDQt1NP5QIwK49zKC6n60FMhMwzn7JP8QGi+UhJUaIdI/FLx4RvYcVTFb zIMFpYY8Yf0NAClmC2vC1VwnVInpf1BBdu3BnoNXJ7eJuWfBIYPqDjj9C6fL4UP70JqH /Ib7j04+87RE0sB4Ek2cL1IGtndFx+lnjzRmbomQvmUgbIMmA2j587UyUM0JrXP8i+NS SWxGuGn0Sgqe5kqW7vvaHqHOIfy5LDqTOD9YTKVrdzYnJYC0vMr79nL+xfn2DkEV99oD B7z1rVLFXnOg7K0P1XJRJPOA7hojvgRXBKuOF32unXwZdgmmqCtF1jfiOJCck8vHFrtS /+1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=KLjnW7Dysz3kdSMdG14HgKt8xanjEW1w5wo++IQk2Ik=; b=tcUPcEab8Yv4fklzSCs8yfJHT2U91dGneX4/6RpEyZxL0etc/eBgb5nPv066Wq5YiU msFChSkGp9T3kmsp5GO722rbi8pPDuTaEMhfDv/HpfQCnTV2GX5Lpi8rRKA4tNrmwwqC itgkhkaBk44CMrOMcWz6EePjfI5u6GJCCJw0q12G8XFdgXcNMVcUNPez/1UQbDIp8lQs oEfLY5IEuOBCbfTE29hxUmzcCr9kSWYvsIXmiQ9JYT5M/sPwjOif0an38X8Ny3PZf2ED e1A+oPNXNe2I145PNIQ9C7P/frdpEBhBGg/9f2PiqK4MiKgMAY4g8Vp5BUnSumr5OAYI r39w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RfJ4tEOq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j1si4240173ejg.178.2020.04.29.09.22.04; Wed, 29 Apr 2020 09:22:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RfJ4tEOq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726893AbgD2QUL (ORCPT + 99 others); Wed, 29 Apr 2020 12:20:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726539AbgD2QUK (ORCPT ); Wed, 29 Apr 2020 12:20:10 -0400 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 381F4C03C1AD for ; Wed, 29 Apr 2020 09:20:10 -0700 (PDT) Received: by mail-wr1-x442.google.com with SMTP id d15so3283454wrx.3 for ; Wed, 29 Apr 2020 09:20:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KLjnW7Dysz3kdSMdG14HgKt8xanjEW1w5wo++IQk2Ik=; b=RfJ4tEOqgeCUGmdBBjdoDwIv6ZcMQd8z0PQuBeNsQWK86kpCplVfsJ2vOOaX3v8hsC 8bIhLwB07cNMSbMnqHoDeus1zeXXvz2JZpabpVhS1itbX0sy7XbFy/Eolr5Bnz9Ch599 +5w8fOMeAxsrhEYQWNxqLktmyAKjKQUGmGhQeMB3YtUb5emV+Jd/Zka5OKX3tFhxqZm8 /te4DCk3YZggeabKaofN8EkYv+rGu7zeooGgS+6t6NKvOrLNYG7FgAfJZI+sQrXsz5/I TyyYSSMTcqefhzOp19h2wub5x1T421cpjT/44hmIMmfP/HSkPxC1FtY9hBG8pjUqaWmu kAmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KLjnW7Dysz3kdSMdG14HgKt8xanjEW1w5wo++IQk2Ik=; b=mK4rMkua77ErjKjmGRTXGDqroG9End61pUl78Wcy38+k4x41wVUIq9TARX3r4J8xv0 O0Om+AlZcb/4SAxHciAVuTMJskR86O7cTVeMbtfwuwPLWtcutpUS1vyoFrlENTBh78ac NgS32g+Uxt3lBnfqh/a3w65DfJIQiIZm0NWfT8lvYL1pR0xFyChUBZdtMl8ioAwPdj61 RqNJCo+yM0mLP0PplFj1szHs0JQwFFpwSJMSRvQ25NXGqxpzt+pyYRws3FpGAIXfpbiL FEa8i6Du3l7ZKP3EjS8u09Xr45WQXfM7USFnbEf1xd9cJIZQpEQkYmE7lOBgAnaAQp03 PTyA== X-Gm-Message-State: AGi0PuZa7wt6vWGRsktEU2DnH2mchZ8fixs61Ojx8xM1sAKNDOqOla59 Pqn+EUPvkguUN707ZJJCGhOoB1hhoEuJFa1iD5U= X-Received: by 2002:adf:fe45:: with SMTP id m5mr42917180wrs.124.1588177208892; Wed, 29 Apr 2020 09:20:08 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alex Deucher Date: Wed, 29 Apr 2020 12:19:57 -0400 Message-ID: Subject: Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state To: Takashi Iwai Cc: Nicholas Johnson , "Zhou, David(ChunMing)" , "alsa-devel@alsa-project.org" , "linux-kernel@vger.kernel.org" , "amd-gfx@lists.freedesktop.org" , Takashi Iwai , Lukas Wunner , "Deucher, Alexander" , "Koenig, Christian" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 29, 2020 at 12:05 PM Takashi Iwai wrote: > > On Wed, 29 Apr 2020 17:47:47 +0200, > Alex Deucher wrote: > > > > On Wed, Apr 29, 2020 at 11:27 AM Nicholas Johnson > > wrote: > > > > > > On Wed, Apr 29, 2020 at 09:37:41AM +0200, Takashi Iwai wrote: > > > > On Tue, 28 Apr 2020 16:48:45 +0200, > > > > Nicholas Johnson wrote: > > > > > > > > > > > > > > > > > > > > > > > FWIW, I have a fiji board in a desktop system and it worked fine when > > > > > > > > > this code was enabled. > > > > > > > > > > > > > > > > Is the new DC code used for Fiji boards? IIRC, the audio component > > > > > > > > binding from amdgpu is enabled only for DC, and without the audio > > > > > > > > component binding the runtime PM won't be linked up, hence you can't > > > > > > > > power up GPU from the audio side access automatically. > > > > > > > > > > > > > > > > > > > > > > Yes, DC is enabled by default for all cards with runtime pm enabled. > > > > > > > > > > > > OK, thanks, I found that amdgpu got bound via component in the dmesg > > > > > > output, too: > > > > > > > > > > > > [ 21.294927] snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) > > > > > > > > > > > > This is the place soon after amdgpu driver gets initialized. > > > > > > Then we see later another initialization phase: > > > > > > > > > > > > [ 26.904127] rfkill: input handler enabled > > > > > > [ 37.264152] [drm] PCIE GART of 1024M enabled (table at 0x000000F400000000). > > > > > > > > > > > > here shows 10 seconds between them. Then, it complained something: > > > > > > > > > > > > > > > > > > [ 37.363287] [drm] UVD initialized successfully. > > > > > > [ 37.473340] [drm] VCE initialized successfully. > > > > > > [ 37.477942] amdgpu 0000:08:00.0: [drm] Cannot find any crtc or sizes > > > > > > > > > > The above would be me hitting WindowsKey+P to change screens, but with > > > > > no DisplayPort attached to Fiji, hence it unable to find crtc. > > > > > > > > > > > > > > > > > ... and go further, and hitting HD-audio error: > > > > > > > > > > > That would be me having attached the DisplayPort and done WindowsKey+P > > > > > again. > > > > > > > > > > > [ 38.936624] [drm] fb mappable at 0x4B0696000 > > > > > > [ 38.936626] [drm] vram apper at 0x4B0000000 > > > > > > [ 38.936626] [drm] size 33177600 > > > > > > [ 38.936627] [drm] fb depth is 24 > > > > > > [ 38.936627] [drm] pitch is 15360 > > > > > > [ 38.936673] amdgpu 0000:08:00.0: fb1: amdgpudrmfb frame buffer device > > > > > > [ 40.092223] snd_hda_intel 0000:08:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x00170500 > > > > > > > > > > > > After this point, HD-audio communication was screwed up. > > > > > > > > > > > > This lastcmd in the above message is AC_SET_POWER_STATE verb for the > > > > > > root node to D0, so the very first command to power up the codec. > > > > > > The rest commands are also about the power up of each node, so the > > > > > > whole error indicate that the power up at runtime resume failed. > > > > > > > > > > > > So, this looks to me as if the device gets runtime-resumed at the bad > > > > > > moment? > > > > > It does. However, this is not going to be easy to pin down. > > > > > > > > > > I moved from Arch to Ubuntu, and it behaves differently. I cannot > > > > > trigger the bug in Ubuntu. Plus, it puts the GPUs asleep, even if > > > > > attached at boot, unlike Arch. I will continue to try to trigger it. But > > > > > even if this is a problem with the Linux distribution, it should not be > > > > > able to trigger a kernel mode bug, so we should persist with finding it. > > > > > > > > Sure, that's a bug to be fixed. > > > > > > > > This made me thinking what happens if we load the HD-audio driver very > > > > late. Could you try to blacklist snd-hda-intel module, then load it > > > > manually after plugging the DP monitor and activating it? > > > Attached dmesg-blacklist-* > > > > > > It is interesting. If I enable the monitor with the module unloaded, and > > > then load the module, I cannot trigger the bug, even if disabling the > > > monitor, waiting for GPU to sleep, and then waking again. > > > > > > Even if I wake monitor up, put to sleep again, and then insmod when > > > sleeping, it does not cause bug when waking again. > > > > > > Is there anything special about the first time the monitor is used? > > > > > > > What do you mean by used? Do you mean plugged in to the GPU or used > > in the GUI? It might be easier to debug this without a GUI involved. > > Can you try this at runlevel 3 or something equivalent for your > > distro? > > > > When the GPU is powered up, the driver gets an interrupt when a > > display is hotplugged and generates an event and userspace > > applications can listen for these events. When the GPU is powered > > down, there's no interrupt. I think most GUIs poll GPUs periodically > > to handle this case so they can detect a new display even when the GPU > > is off. Maybe we are getting some sort of race here. GUI queries GPU > > driver, causes GPU to wake up, checks attached displays, GPU driver > > resets runtime pm timer. GPU goes back to sleep. The detection > > updates the ELD data which causes the HDA driver to wake up. It > > assumes the hw is on and tries to query it. In the meantime, the GPU > > has already powered everything down again. > > Well, but the code path there is the runtime PM resume of the audio > device and it means that GPU must have been runtime-resumed again > beforehand via the device link. So, it should have worked from the > beginning but in reality not -- that is, apparently some inconsistency > is found in the initial attempt of the runtime resume... Yeah, it should be covered, but I wonder if there is something in the ELD update sequence that needs to call pm_runtime_get_sync()? The ELD sequence on AMD GPUs doesn't work the same as on other vendors. The GPU driver has a backdoor into the HDA device's verbs to set update the audio state rather than doing it via an ELD buffer update. We still update the ELD buffer for consistency. Maybe when the GPU driver sets the audio state at monitor detection time that triggers an interrupt or something on the HDA side which races with the CPU and the power down of the GPU. That still seems unlikely though since the runtime pm on the GPU side defaults to a 5 second suspend timer. Alex