Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp952259pxb; Fri, 22 Apr 2022 15:10:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxk1ijewawvGVch8PoAi+x6OekinUJxewZP4XO19p9FcG/PnZVvq5Wm3Z3sUPinUxpBzAac X-Received: by 2002:a05:6a00:1ad2:b0:50a:8181:fed4 with SMTP id f18-20020a056a001ad200b0050a8181fed4mr7289398pfv.16.1650665452174; Fri, 22 Apr 2022 15:10:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650665452; cv=none; d=google.com; s=arc-20160816; b=o2jaN0MK/W6VeGhWLL7GOHcXhWJMyf8DACDfs4Mec+nYgLydDfAp7oVo9Guh8HheJt i8L1LRuK8bvPKP6w4ScyZFi2hh+WAxYkIxExcH987YMp+PemLtSlwC//5fcWIqKs0nB5 bXpirOGCOITYW9UHoKscq92a0mkSWIBUZSlscfFzHxfQ+mREcVDO6ijlxGBYUd94oUNT ZOl0QolCVb2thYJso6HtDPe4gSahZUWdXjHH4DSQgb7z2HSSZa1aQKGzY0O2pIOG/pXN GcQC1twwj0b8OKQzP5GfxHBqJ8cXgHkb39pvDo5FsH4OVHydE8aqULS41kVwCgeriZBO bDJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=zmWTHCR04t1ev/t98WLwYMMz3X1+7iyX5fbH3k/u9Rs=; b=dQeblmL7+3lvQrOcJCRiPgsB+DAlO7e9utqisf3U4adeu9T6fKAE88I01iG97Hm3Wv yoS3u1aQitFGAbGv16MLi3Lybh8ZUNzU2Wvhx47Ce+mJyEqqJOLg+zZwDMTGJKFt0+AE OkGZIRnjMQOdzMM6tJOdilKtiVw0jaQ37ksylWosqw2wkfcMPP7Ooea5rPn6RU5Uj+5S sMWwIGQWQ6t9mtUTy9+cyr8Y2va8nRZvA8iyOtNv5eEJcUcblFel6VC+Rj2Ye1dHSivr 6envRX8L1PC32IVUbJbk3Y6Pm3oiWbI16nVw/vvMuFGuExHMucKX/Srjp5/8bvfAFkFN WwlA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id x3-20020a17090a8a8300b001cb91e3857dsi2254074pjn.15.2022.04.22.15.10.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 15:10:52 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DE879398B0D; Fri, 22 Apr 2022 13:16:44 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382335AbiDTVFU (ORCPT + 99 others); Wed, 20 Apr 2022 17:05:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382328AbiDTVFT (ORCPT ); Wed, 20 Apr 2022 17:05:19 -0400 Received: from mx1.molgen.mpg.de (mx3.molgen.mpg.de [141.14.17.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C445C2529F for ; Wed, 20 Apr 2022 14:02:29 -0700 (PDT) Received: from [192.168.0.2] (ip5f5ae925.dynamic.kabel-deutschland.de [95.90.233.37]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) (Authenticated sender: pmenzel) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 371A161CCD7D8; Wed, 20 Apr 2022 23:02:28 +0200 (CEST) Message-ID: <462dce7d-73fc-ea8c-0a8a-5e8722ee1967@molgen.mpg.de> Date: Wed, 20 Apr 2022 23:02:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems Content-Language: en-US To: Richard Gong Cc: Alex Deucher , Dave Airlie , Xinhui Pan , LKML , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , Alexander Deucher , =?UTF-8?Q?Christian_K=c3=b6nig?= , Mario Limonciello References: <20220412215000.897344-1-richard.gong@amd.com> <91e916e3-d793-b814-6cbf-abee0667f5f8@molgen.mpg.de> <94fd858d-1792-9c05-b5c6-1b028427687d@amd.com> <295e7882-21a2-f50f-6bfa-b0bae1d0fa12@molgen.mpg.de> <67e98c3e-cfa3-ee51-4932-bbad8de5ffd8@amd.com> From: Paul Menzel In-Reply-To: <67e98c3e-cfa3-ee51-4932-bbad8de5ffd8@amd.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear Richard, Am 20.04.22 um 22:56 schrieb Gong, Richard: > On 4/20/2022 3:48 PM, Paul Menzel wrote: >> Am 20.04.22 um 22:40 schrieb Alex Deucher: >>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel >>> wrote: >> >>>> Am 19.04.22 um 23:46 schrieb Gong, Richard: >>>> >>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote: >>>>>> [Cc: -kernel test robot ] >>>> >>>> […] >>>> >>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher: >>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote: >>>>>> >>>>>>>> Thank you for sending out v4. >>>>>>>> >>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong: >>>>>>>>> Active State Power Management (ASPM) feature is enabled since >>>>>>>>> kernel 5.14. >>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't >>>>>>>>> work >>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX >>>>>>>>> cards as >>>>>>>>> video/display output, Intel Alder Lake based systems will hang >>>>>>>>> during >>>>>>>>> suspend/resume. >>>> >>>> [Your email program wraps lines in cited text for some reason, making >>>> the citation harder to read.] >>>> >>>>>>>> >>>>>>>> I am still not clear, what “hang during suspend/resume” means. I >>>>>>>> guess >>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does >>>>>>>> it hang? >>>>>>>> The system is functional, but there are only display problems? >>>>> System freeze after suspend/resume. >>>> >>>> But you see certain messages still? At what point does it freeze >>>> exactly? In the bug report you posted Linux messages. >>>> >>>>>>>>> The issue was initially reported on one system (Dell Precision >>>>>>>>> 3660 >>>>>>>>> with >>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at >>>>>>>>> least 4 >>>>>>>>> Alder >>>>>>>>> Lake based systems. >>>>>>>>> >>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems. >>>>>>>>> >>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") >>>>>>>>> Link: >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&data=05%7C01%7Crichard.gong%40amd.com%7C487aaa63098b462e146a08da230f2319%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860845178176835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3IVldn05qNa2XVp1Lu58SriS8k9mk4U9K9p3F3IYPe0%3D&reserved=0 >>>>>>>>> >>>> >>>> Thank you Microsoft Outlook for keeping us safe. :( >>>> >>>>>>>>> >>>>>>>>> Reported-by: kernel test robot >>>>>>>> >>>>>>>> This tag is a little confusing. Maybe clarify that it was for an >>>>>>>> issue >>>>>>>> in a previous patch iteration? >>>>> >>>>> I did describe in change-list version 3 below, which corrected the >>>>> build >>>>> error with W=1 option. >>>>> >>>>> It is not good idea to add the description for that to the commit >>>>> message, this is why I add descriptions on change-list version 3. >>>> >>>> Do as you wish, but the current style is confusing, and readers of the >>>> commit are going to think, the kernel test robot reported the problem >>>> with AMD VI ASICs and Intel Alder Lake systems. >>>> >>>>>>>> >>>>>>>>> Signed-off-by: Richard Gong >>>>>>>>> --- >>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86 >>>>>>>>>        enhanced check logic >>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check >>>>>>>>>        correct build error with W=1 option >>>>>>>>> v2: correct commit description >>>>>>>>>        move the check from chip family to problematic platform >>>>>>>>> --- >>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++- >>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c >>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c >>>>>>>>> @@ -81,6 +81,10 @@ >>>>>>>>>     #include "mxgpu_vi.h" >>>>>>>>>     #include "amdgpu_dm.h" >>>>>>>>> >>>>>>>>> +#if IS_ENABLED(CONFIG_X86) >>>>>>>>> +#include >>>>>>>>> +#endif >>>>>>>>> + >>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6 >>>>>>>>>     #define >>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK >>>>>>>>> 0x00000001L >>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK >>>>>>>>> 0x00000002L >>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct >>>>>>>>> amdgpu_device *adev) >>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data); >>>>>>>>>     } >>>>>>>>> >>>>>>>>> +static bool aspm_support_quirk_check(void) >>>>>>>>> +{ >>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) { >>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0); >>>>>>>>> + >>>>>>>>> +             return !(c->x86 == 6 && c->x86_model == >>>>>>>>> INTEL_FAM6_ALDERLAKE); >>>>>>>>> +     } >>>>>>>>> + >>>>>>>>> +     return true; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev) >>>>>>>>>     { >>>>>>>>>         u32 data, data1, orig; >>>>>>>>>         bool bL1SS = false; >>>>>>>>>         bool bClkReqSupport = true; >>>>>>>>> >>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev)) >>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) || >>>>>>>>> !aspm_support_quirk_check()) >>>>>>>>>                 return; >>>>>>>> >>>>>>>> Can users still forcefully enable ASPM with the parameter >>>>>>>> `amdgpu.aspm`? >>>>>>>> >>>>> As Mario mentioned in a separate reply, we can't forcefully enable >>>>> ASPM >>>>> with the parameter 'amdgpu.aspm'. >>>> >>>> That would be a regression on systems where ASPM used to work. Hmm. I >>>> guess, you could say, there are no such systems. >>>> >>>>>>>>> >>>>>>>>>         if (adev->flags & AMD_IS_APU || >>>>>>>> >>>>>>>> If I remember correctly, there were also newer cards, where ASPM >>>>>>>> worked >>>>>>>> with Intel Alder Lake, right? Can only the problematic >>>>>>>> generations for >>>>>>>> WX3200 and RX640 be excluded from ASPM? >>>>>>> >>>>>>> This patch only disables it for the generatioaon that was >>>>>>> problematic. >>>>>> >>>>>> Could that please be made clear in the commit message summary, and >>>>>> message? >>>>> >>>>> Are you ok with the commit messages below? >>>> >>>> Please change the commit message summary. Maybe: >>>> >>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems >>>> >>>>> Active State Power Management (ASPM) feature is enabled since >>>>> kernel 5.14. >>>>> >>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't >>>>> work >>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX >>>>> cards as >>>>> video/display output, Intel Alder Lake based systems will freeze after >>>>> suspend/resume. >>>> >>>> Something like: >>>> >>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic >>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize >>>> when resuming from S0ix(?). >>>> >>>> >>>>> The issue was initially reported on one system (Dell Precision 3660 >>>>> with >>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 >>>>> Alder >>>>> Lake based systems. >>>> >>>> Which ones? >>>> >>>>> Add extra check to disable ASPM on Intel Alder Lake based systems with >>>>> problematic generation GFX cards. >>>> >>>> … with the problematic Volcanic Islands GFX cards. >>>> >>>>>> >>>>>> Loosely related, is there a public (or internal issue) to analyze how >>>>>> to get ASPM working for VI generation devices with Intel Alder Lake? >>>>> >>>>> As Alex mentioned, we need support from Intel. We don't have any >>>>> update >>>>> on that. >>>> >>>> It’d be great to get that fixed properly. >>>> >>>> Last thing, please don’t hate me, does Linux log, that ASPM is >>>> disabled? >>> >>> I'm not sure what gets logged at the platform level with respect to >>> ASPM, but whether or not the driver enables ASPM is tied to whether >>> ASPM is allowed at the platform level or not so if the platform >>> indicates that ASPM is not supported, the driver won't enable it.  The >>> driver does not log whether ASPM is enabled or not if that is what you >>> are asking.  As to whether or not it should, it comes down to how much >>> stuff is worth indiciating in the log.  The driver is already pretty >>> chatty by driver standards. >> >> I specifically mean, Linux should log the quirks it applies. (As a >> normal user, I’d also expect ASPM to work nowadays, so a message, that >> it’s disabled would help a lot.) > > In general rule we shouldn't generate additional log unless something > went wrong with the system. Please run `dmesg` and see that your statement is false. That’s what log levels are for, and in your case, it would be at least error level. Also, I claim, something indeed went wrong, because a quirk had to be applied. So please add a notice log level, that ASPM gets disabled: Disable ASPM on Alder Lake with Volcanic Islands card due to resume problems. System energy consumption might be higher than expected. Kind regards, Paul