Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp2127940rwl; Thu, 30 Mar 2023 06:26:57 -0700 (PDT) X-Google-Smtp-Source: AKy350ZxYLsDmnVS3A+6ew+FLKJ5BOFMVMo48hL6i2/FvEUQNepd8N5GZNFhc1fuj3VOrqKw9YQG X-Received: by 2002:a17:902:ea01:b0:1a2:175a:6153 with SMTP id s1-20020a170902ea0100b001a2175a6153mr2326158plg.1.1680182816858; Thu, 30 Mar 2023 06:26:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680182816; cv=none; d=google.com; s=arc-20160816; b=G3Lm9UZk6KnK7C3fF7GA2FehrYAC34gROooulPZ7pbrVMd5jgJi6nuycAkzBY/6A34 MxA4lnuY032Ic6Lj5+jzjiyWxgYhX7UG+0/2P992PYx4UzMqeUW5G1wmhJ/F5c6O2vOv 9cNZiPDzjqrl1DeU77u8VC6wX71RDfqoKefBgh4hlYNK46doTAykNiGCJCb9XPX1MlCR cujifpdfvTiuQUY73e1lGVBR51tc27iS7IMIAoqHGL1TQWGq+9Vqho4QHcdTZKb5G71t +IwbGYCfylXCTWHpInenIX5TlNRifBBp1EB8fqIU4d7U/IeghujfDI3fWVLhj0opBPDn xY9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=G9YMpuSttM4A9nULCMBhiq2+uydHNy9WNmU1ZG++oTM=; b=JAqwpU6ZxTYSBrLbBfMyjXUkIoC56TpY8TTAmCjvnowJQz9G/Q92VtVbd2/+Njf0LR q/cvuhbpAwZXO/HICcctJwMG9hrXbuY8T0GquA4aX48cB0YcmGmam+p0Z34iEpLP2ptO oRcAf/5ppWBFK9iU3QyTSkSvTnNZ1h4IhYIV91vCtQY/wK7+s5cZFgoHtpVhJs4LxKYX wO/05mjfPr4+Aozwn8AYS0aX9SsGk2gHnYD7j/837uPr3We+Zj43jaMwr2JU22qtd3Ay jdUa1wvIDg3QO9tyCJE/vRyzvqfqY1cNoHGaGXxegGbZ/Cpp5Zlfo0Cau5TjCpIIkcDI sL2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=blZZRDRB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f13-20020a170902ce8d00b001a22091eea9si4101841plg.225.2023.03.30.06.26.44; Thu, 30 Mar 2023 06:26:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=blZZRDRB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231460AbjC3NWn (ORCPT + 99 others); Thu, 30 Mar 2023 09:22:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230427AbjC3NWm (ORCPT ); Thu, 30 Mar 2023 09:22:42 -0400 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DCC8185 for ; Thu, 30 Mar 2023 06:22:41 -0700 (PDT) Received: by mail-oi1-x235.google.com with SMTP id w13so2738284oik.2 for ; Thu, 30 Mar 2023 06:22:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680182561; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=G9YMpuSttM4A9nULCMBhiq2+uydHNy9WNmU1ZG++oTM=; b=blZZRDRBNT5kYMjCC85ZeQpLeFdPOaXatuCk4NyXH1yuA9j9xns+8uFcbTyf6WGsd7 BDclarMJIrma70T6OYfrbTKiaJktRYiWtH2IupaFo26teMxv1+nv1CU/Je+ECtaGqleu +T7GquzfQQRKKcFfc/Ta8p8NIkr3Lu11J9EmmwLDKz5MpdzSkh7V3T4ZLjUNXVqZcOpW hbhyN50tsxwj4fL3Gbsbt97t0hVUNuwNegP1RVM2DEowXcZxRcWoHTZXaFDnBaj2MwW2 z0Ko4Al/zR8CxIwAqfIsUfpt4efg4XRPVPjgcZJRTCvxjQcjicKRXrNb544sZnHxOHR9 nAug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680182561; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G9YMpuSttM4A9nULCMBhiq2+uydHNy9WNmU1ZG++oTM=; b=tyA3JTookJS1oUVfBM8RkVH3c7+uKWAX6RuYEhKG7tTsNKUzEF9cOCGUbfOeg8nC0F oRxpdr7T7GRZQ+RIwBr8/qgaBqUKn0SkteP+qynBDDzpRCGW3Ybzy93IN0NxVElpkPrt m1eMf4Ac2a3gMh7xCVyxlnLzv8GfPiSupd+v5QMMApIoEAsxC5Oi+tmwLdiLaqxJUtkF 0BoVS/c6jk1d1uB8HAWiRBsIaTjB7zY9hd2wMhEgx8y+VgoEeC/2vP4dlvo7xcLXCzrf nCFVE9sM8X7DHjZLe06IszeZhzzQGsOzrJjx8m0h2jm21/dVpWlxW0AiL077XMyAkFQR TROg== X-Gm-Message-State: AAQBX9cPuTVAAZUHui1v5d2ISAU4g52mPu3wRB7cIbPf/OcUEmqOAucP 6uwWXwmZLODk6cpuut4w+u7JcYieZ53QyNLcFis= X-Received: by 2002:a54:4019:0:b0:386:a2d0:2814 with SMTP id x25-20020a544019000000b00386a2d02814mr2811957oie.4.1680182560801; Thu, 30 Mar 2023 06:22:40 -0700 (PDT) MIME-Version: 1.0 References: <20230329095933.1203559-1-kai.heng.feng@canonical.com> <76853776-ddfd-2fbc-a209-ca4f77faa481@amd.com> In-Reply-To: From: Alex Deucher Date: Thu, 30 Mar 2023 09:22:29 -0400 Message-ID: Subject: Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO To: Kai-Heng Feng Cc: Mario Limonciello , Jingyu Wang , Xinhui.Pan@amd.com, Andrey Grodzovsky , Lijo Lazar , dri-devel@lists.freedesktop.org, =?UTF-8?Q?Michel_D=C3=A4nzer?= , YiPeng Chai , Guchun Chen , "Rafael J. Wysocki" , amd-gfx@lists.freedesktop.org, Jiansong Chen , Kenneth Feng , Tim Huang , Bokun Zhang , Hans de Goede , Maxime Ripard , Evan Quan , Somalapuram Amaranath , linux-kernel@vger.kernel.org, alexander.deucher@amd.com, christian.koenig@amd.com, Hawking Zhang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 29, 2023 at 11:36=E2=80=AFPM Kai-Heng Feng wrote: > > On Wed, Mar 29, 2023 at 9:23=E2=80=AFPM Mario Limonciello > wrote: > > > > > > On 3/29/23 04:59, Kai-Heng Feng wrote: > > > When the power is lost due to ACPI power resources being turned off, = the > > > driver should reset the GPU so it can work anew. > > > > > > First, _PR3 support of the hierarchy needs to be found correctly. Sin= ce > > > the GPU on some discrete GFX cards is behind a PCIe switch, checking = the > > > _PR3 on downstream port alone is not enough, as the _PR3 can associat= e > > > to the root port above the PCIe switch. > > > > I think this should be split into two commits: > > > > * One of them to look at _PR3 further up in hierarchy to fix indication > > for BOCO support. > > Yes, this part can be split up. > > > > > * One to adjust policy for whether to reset > > IIUC, the GPU only needs to be reset when the power status isn't certain? > > Assuming power resources in _PR3 are really disabled, GPU is already > reset by itself. That means reset shouldn't be necessary for D3cold, > am I understanding it correctly? Right, if D3cold actually works, then no reset is necessary. > > However, this is a desktop plugged with GFX card that has external > power, does that assumption still stand? Perform resetting on D3cold > can cover this scenario. BOCO is generally only available on laptops and all-in-one systems where the dGPU is integrated into the platform. Power to the dGPU is controlled by a GPIO which is toggled by the ACPI _PR3 method for the device. There is an ATPX method on all platforms which support BOCO. Since this is an AIB in a desktop system, I doubt it actually supports D3Cold. For desktop systems, we have what we call BACO where the driver controls power to everything on the GPU except the bus interface. In the BACO case, we can turn off the GPU, but the device still shows up on the PCI bus. For BOCO, the device is completely powered down and disappears from the PCI bus. Alex > > > > > > > > Once the _PR3 is found and BOCO support is correctly marked, use that > > > information to inform the GPU should be reset. This solves an issue t= hat > > > system freeze on a Intel ADL desktop that uses S0ix for sleep and D3c= old > > > is supported for the GFX slot. > > > > I'm worried this is still papering over an underlying issue with L0s > > handling on ALD + Navi1x/Navi2x. > > Is it possible to get the ASIC's ASPM parameter under Windows? Knowing > the difference can be useful. > > > > > Also, what about runtime suspend? If you unplug the monitor from this > > dGPU and interact with it over SSH it should go into runtime suspend. > > > > Is it working properly for that case now? > > Thanks for the tip. Runtime resume doesn't work at all: > [ 1087.601631] pcieport 0000:00:01.0: power state changed by ACPI to D0 > [ 1087.613820] pcieport 0000:00:01.0: restoring config space at offset > 0x2c (was 0x43, writing 0x43) > [ 1087.613835] pcieport 0000:00:01.0: restoring config space at offset > 0x28 (was 0x41, writing 0x41) > [ 1087.613841] pcieport 0000:00:01.0: restoring config space at offset > 0x24 (was 0xfff10001, writing 0xfff10001) > [ 1087.613978] pcieport 0000:00:01.0: PME# disabled > [ 1087.613984] pcieport 0000:00:01.0: waiting 100 ms for downstream > link, after activation > [ 1089.330956] pcieport 0000:01:00.0: not ready 1023ms after resume; givi= ng up > [ 1089.373036] pcieport 0000:01:00.0: Unable to change power state > from D3cold to D0, device inaccessible > > After a short while the whole system froze. > > So the upstream port of GFX's PCIe switch cannot be powered on again. > > Kai-Heng > > > > > > > > > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") > > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885 > > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 > > > Signed-off-by: Kai-Heng Feng > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 +++ > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 ++++++- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +++++------- > > > 3 files changed, 14 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/d= rm/amd/amdgpu/amdgpu_acpi.c > > > index 60b1857f469e..407456ac0e84 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c > > > @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_d= evice *adev) > > > if (amdgpu_sriov_vf(adev)) > > > return false; > > > > > > + if (amdgpu_device_supports_boco(adev_to_drm(adev))) > > > + return true; > > > + > > > #if IS_ENABLED(CONFIG_SUSPEND) > > > return pm_suspend_target_state !=3D PM_SUSPEND_TO_IDLE; > > > #else > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu= /drm/amd/amdgpu/amdgpu_device.c > > > index f5658359ff5c..d56b7a2bafa6 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct = amdgpu_device *adev) > > > > > > if (!(adev->flags & AMD_IS_APU)) { > > > parent =3D pci_upstream_bridge(adev->pdev); > > > - adev->has_pr3 =3D parent ? pci_pr3_present(parent) : fa= lse; > > > + do { > > > + if (pci_pr3_present(parent)) { > > > + adev->has_pr3 =3D true; > > > + break; > > > + } > > > + } while ((parent =3D pci_upstream_bridge(parent))); > > > } > > > > > > amdgpu_amdkfd_device_probe(adev); > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/dr= m/amd/amdgpu/amdgpu_drv.c > > > index ba5def374368..5d81fcac4b0a 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device= *dev) > > > struct drm_device *drm_dev =3D dev_get_drvdata(dev); > > > struct amdgpu_device *adev =3D drm_to_adev(drm_dev); > > > > > > - if (amdgpu_acpi_is_s0ix_active(adev)) > > > - adev->in_s0ix =3D true; > > > - else if (amdgpu_acpi_is_s3_active(adev)) > > > + if (amdgpu_acpi_is_s3_active(adev) || > > > + amdgpu_device_supports_boco(drm_dev)) > > > adev->in_s3 =3D true; > > > + else if (amdgpu_acpi_is_s0ix_active(adev)) > > > + adev->in_s0ix =3D true; > > > if (!adev->in_s0ix && !adev->in_s3) > > > return 0; > > > return amdgpu_device_suspend(drm_dev, true); > > > @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *= dev) > > > adev->no_hw_access =3D true; > > > > > > r =3D amdgpu_device_resume(drm_dev, true); > > > - if (amdgpu_acpi_is_s0ix_active(adev)) > > > - adev->in_s0ix =3D false; > > > - else > > > - adev->in_s3 =3D false; > > > + adev->in_s0ix =3D adev->in_s3 =3D false; > > > return r; > > > } > > >