Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1217482pxk; Fri, 18 Sep 2020 07:00:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxESXOc05vQn/csn1Nnrw774E6ArRfdmN6bqa4zFZBbN2h1anlU1RO6VPXrMkUSUPRA1uCD X-Received: by 2002:a50:ab59:: with SMTP id t25mr37423949edc.364.1600437609818; Fri, 18 Sep 2020 07:00:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600437609; cv=none; d=google.com; s=arc-20160816; b=bxtRHVwPBQIabJlw5X2qOfyZaDbbB0wNzNTqPAhm+qgiAx+NXyKxLI6Df+oHHsvuSi vZ3Xn/caa3Akuw9cXbK4GTNkvPGC3YEsLbjkSvNtYU7e1a6b5XfyVPtxXAIY+lN9967M mHhtLyZj857f6oI1Juq4MBpKPKcbLL3lMkXj12Jm4yLQ/y6pLKJpFyhf69x5fafDboAu NFxb/pIFRjhHvBJ3MIuSHbRR3c+TDKlwH2OJFnLGezwd2Wjo+V42CtPE1mhT7dZwcjgs QhYuBLlL+RXKhRN7K9y1G2qniQ0if18WjzkCcPAY1Fq4kdsH4q+q6jdMulcOOLRCvzpi 0ekw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=JyHX/3xaI4MUjJWsWpVy5PNH+Y48SlI7AjmXE8ItCY4=; b=AradyoLfygSklLLCfxBVwzRN0PMaVG6EUc2tbXYrn90FoBxYsYOYuK4DEZaGiZvKIU ks9s9yx1H837m17LZNMe77NtjoRIs8IUdEaX/6fEaGX7NcU2371Y600YAHzaenM+wekv aUGALyanEBGzpXRiCSmGcvh0XBrP91W8lAoHX7ZN5DnUFexRw0fJMVUaYYJDc5wkrQ0V VXYygmMWOF4w1hqnUEse1XaxmroGkwi9uta6Jh0h+5kj1gepcadaMZBTSCLehqae2iyK OZLKGAMfkLnznqlIXqFZdvX9TuHk97iJ73eeeaXbPExAeuTqYy8EiSGRg/dqEq/N9VaH LjkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=HaN8UMxQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id pw7si2185956ejb.409.2020.09.18.06.59.46; Fri, 18 Sep 2020 07:00:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=HaN8UMxQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726898AbgIRN5v (ORCPT + 99 others); Fri, 18 Sep 2020 09:57:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726406AbgIRN5t (ORCPT ); Fri, 18 Sep 2020 09:57:49 -0400 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72DFAC0613CE; Fri, 18 Sep 2020 06:57:49 -0700 (PDT) Received: by mail-wr1-x443.google.com with SMTP id s12so5718808wrw.11; Fri, 18 Sep 2020 06:57:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JyHX/3xaI4MUjJWsWpVy5PNH+Y48SlI7AjmXE8ItCY4=; b=HaN8UMxQ6Z/S3OwGAIcjPPmvBARR7wJZunOxgWrFCryJXlG256dWNW6qFFUqZIMwGo tqDc88KQ0bjbPoz+R+7iEYdJBqBqDUQvp/K1p3BTfadlR6AnjAtB6o1MWtzAnUy0HnKp xKbuDnLR7Pssn06GQSk5+FnxBOoYJysnDThKKIP8zm7xyM3MnITaZgmqIM0F2deB/IHF VeCX2GhtmcZqANVzingHDFA2s4pZo2/y1AW6ovuceejdYTwl3UMIUPsMkit+V/p17noz rCYL+6n/G39xQQvfcZptkXOtiSMyGM1k2MSAnR9yHFP3xxj48NmI6l3QgIbrp2WVpVkL MnJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JyHX/3xaI4MUjJWsWpVy5PNH+Y48SlI7AjmXE8ItCY4=; b=UA+h5uQwK1hXOlp9/1/oaHmU/3+9ObrxTA+pAUlUAVC3JaePP7crn98MbmkZIM8QQj /UaaRXF5aUb1MvGYbfzgYbOrBBpEvdDydKRrS3BjLSHd01p5o0kS3YAXWVHHGklqXVFW QfpxUUhBmiF6VaH87wJWle3TugNqbbjTUhLvQDt/62C4HrpvP16ksf1ZbJ4NQRbpu/b5 PsG8GA+LSSZjCCHp94VKOoiKoLpxk4XU7mFDfoXeBao9hYDMnKlQ7sAhFICzXUH5Atml jMt+dMVg2qdx1JUzsYo8Plj9qPVht9tQXvXeHTRLkpSIAR7Empwt++OZ2ZWydaUXxWSJ Sebg== X-Gm-Message-State: AOAM5338rPNn9GLAKIqajzN3/BaBnxWKIapqKv7Zlh7WzLsyU74QNo3k n7FnsZ6eRtEXZSwRxwDSCbuqXDtDMWVCi2DAO9Q= X-Received: by 2002:adf:fc0a:: with SMTP id i10mr37464146wrr.111.1600437468135; Fri, 18 Sep 2020 06:57:48 -0700 (PDT) MIME-Version: 1.0 References: <20200918020110.2063155-1-sashal@kernel.org> <20200918020110.2063155-265-sashal@kernel.org> In-Reply-To: From: Alex Deucher Date: Fri, 18 Sep 2020 09:57:37 -0400 Message-ID: Subject: Re: [PATCH AUTOSEL 5.4 265/330] drm/amd/powerplay: try to do a graceful shutdown on SW CTF To: "Quan, Evan" Cc: Sasha Levin , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" , "Deucher, Alexander" , "dri-devel@lists.freedesktop.org" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 18, 2020 at 3:17 AM Quan, Evan wrote: > > [AMD Official Use Only - Internal Distribution Only] > > Hi @Sasha Levin @Deucher, Alexander, > > The following changes need to be applied also. > Otherwise, you may see unexpected shutdown on stress gpu loading on Vega10. > > drm/amd/pm: avoid false alarm due to confusing softwareshutdowntemp setting > drm/amd/pm: correct the thermal alert temperature limit settings > drm/amd/pm: correct Vega20 swctf limit setting > drm/amd/pm: correct Vega12 swctf limit setting > drm/amd/pm: correct Vega10 swctf limit setting I would suggest we just drop this patch for kernels prior to 5.8 (where it was introduced). Alex > > BR > Evan > -----Original Message----- > From: Sasha Levin > Sent: Friday, September 18, 2020 10:00 AM > To: linux-kernel@vger.kernel.org; stable@vger.kernel.org > Cc: Quan, Evan ; Deucher, Alexander ; Sasha Levin ; dri-devel@lists.freedesktop.org > Subject: [PATCH AUTOSEL 5.4 265/330] drm/amd/powerplay: try to do a graceful shutdown on SW CTF > > From: Evan Quan > > [ Upstream commit 9495220577416632675959caf122e968469ffd16 ] > > Normally this(SW CTF) should not happen. And by doing graceful shutdown we can prevent further damage. > > Signed-off-by: Evan Quan > Reviewed-by: Alex Deucher > Signed-off-by: Alex Deucher > Signed-off-by: Sasha Levin > --- > .../gpu/drm/amd/powerplay/hwmgr/smu_helper.c | 21 +++++++++++++++---- > drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 7 +++++++ > 2 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c > index d09690fca4520..414added3d02c 100644 > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c > @@ -22,6 +22,7 @@ > */ > > #include > +#include > > #include "hwmgr.h" > #include "pp_debug.h" > @@ -593,12 +594,18 @@ int phm_irq_process(struct amdgpu_device *adev, > uint32_t src_id = entry->src_id; > > if (client_id == AMDGPU_IRQ_CLIENTID_LEGACY) { > -if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) > +if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) { > pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n", > PCI_BUS_NUM(adev->pdev->devfn), > PCI_SLOT(adev->pdev->devfn), > PCI_FUNC(adev->pdev->devfn)); > -else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) > +/* > + * SW CTF just occurred. > + * Try to do a graceful shutdown to prevent further damage. > + */ > +dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); > +orderly_poweroff(true); > +} else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) > pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", > PCI_BUS_NUM(adev->pdev->devfn), > PCI_SLOT(adev->pdev->devfn), > @@ -609,12 +616,18 @@ int phm_irq_process(struct amdgpu_device *adev, > PCI_SLOT(adev->pdev->devfn), > PCI_FUNC(adev->pdev->devfn)); > } else if (client_id == SOC15_IH_CLIENTID_THM) { > -if (src_id == 0) > +if (src_id == 0) { > pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n", > PCI_BUS_NUM(adev->pdev->devfn), > PCI_SLOT(adev->pdev->devfn), > PCI_FUNC(adev->pdev->devfn)); > -else > +/* > + * SW CTF just occurred. > + * Try to do a graceful shutdown to prevent further damage. > + */ > +dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); > +orderly_poweroff(true); > +} else > pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", > PCI_BUS_NUM(adev->pdev->devfn), > PCI_SLOT(adev->pdev->devfn), > diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c > index c4d8c52c6b9ca..6c4405622c9bb 100644 > --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c > +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c > @@ -23,6 +23,7 @@ > #include > #include > #include > +#include > > #include "pp_debug.h" > #include "amdgpu.h" > @@ -1538,6 +1539,12 @@ static int smu_v11_0_irq_process(struct amdgpu_device *adev, > PCI_BUS_NUM(adev->pdev->devfn), > PCI_SLOT(adev->pdev->devfn), > PCI_FUNC(adev->pdev->devfn)); > +/* > + * SW CTF just occurred. > + * Try to do a graceful shutdown to prevent further damage. > + */ > +dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n"); > +orderly_poweroff(true); > break; > case THM_11_0__SRCID__THM_DIG_THERM_H2L: > pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n", > -- > 2.25.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel