Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp11917314rwl; Tue, 3 Jan 2023 06:38:04 -0800 (PST) X-Google-Smtp-Source: AMrXdXt9bb8xOqdMqZWKK+eeqgIMvK9aR7x6m5dYAWjoYc7WRNCiMPA1481AwLhvoY0BBuF6Ikp2 X-Received: by 2002:a05:6a21:32a5:b0:a4:93ca:a2d with SMTP id yt37-20020a056a2132a500b000a493ca0a2dmr79350682pzb.49.1672756684193; Tue, 03 Jan 2023 06:38:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672756684; cv=none; d=google.com; s=arc-20160816; b=QKBDsq7ktsLfAwP7GIXcEpvyPz1Rih2vk/kgaQCFs6hIFE5rs53WOyh1lkHs0WxCfY X5U2mICq+ZgwWB60h9/imvURY3JuJz6kxFAA71n+orLsBLAC/S736LTRc4JiJYRJ/uOI 6XiZbepYjUonvLQfxNe8sRxZh40vnum1z4mKfpCSlAHPnTAI0q/fKbEmfW9BR2RH3jDy D8TtM28VwMdzUg9q6GzMJoK3MQeFSvUSqCDXw7GUHVnZghPF7Infy5mHFTZEBJxnsDWk u4ZmwXhVEk9HODh+YaxdcvItYDJPz/SPkaT9WDh6ge6mgmjOkNqqsIrDGhEUjcFFjpqj G6dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=P+ArUfu7KFijl6muxgVmT9H4zzXFEpCZ/1mv/6B2SoA=; b=n1DqygpJ2KuWNm5e4oDJnuwDsWUNkMnJ1X4IVSC7M9JXxeK4rrTqh4txD1/UYjCjQq U6l5YhQGfoCNYs8St/V5ADIYFEHkqbAHLWRIUPD2b9u1BvzpWNd/j9d6zYzxGRVGTJk+ cjToM9Ds+WwwP2X2lrAwFpcK/H1LiyCFWunDJoGGUOy9sWFV7MvIoNPBGLjFO2BhqG4Y cDb6rU2XB2ll/aGYGpBhfUCK5p6f3CBbIzD5i1KDWhwjS81+nNs279Tx23v4eq9nSxA2 wY3TvrqMCK2A6+SSk93L/cvJBOjRofFvpBvHiwcQhNG8PP55Ib1+T6hdvt4lEfOsTntX M7ZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=nTO14SOw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c22-20020a63d156000000b00477dc113782si31492698pgj.600.2023.01.03.06.37.56; Tue, 03 Jan 2023 06:38:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=nTO14SOw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233299AbjACORj (ORCPT + 60 others); Tue, 3 Jan 2023 09:17:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233122AbjACORh (ORCPT ); Tue, 3 Jan 2023 09:17:37 -0500 Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22D9D255 for ; Tue, 3 Jan 2023 06:17:36 -0800 (PST) Received: by mail-oi1-x22d.google.com with SMTP id c133so27208624oif.1 for ; Tue, 03 Jan 2023 06:17:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=P+ArUfu7KFijl6muxgVmT9H4zzXFEpCZ/1mv/6B2SoA=; b=nTO14SOwfd1VwYx1HO58wK9SCmIM9R+ZS2yywbo/1XDaeaOTFvhhA4cjrZ7Q1qNn9P B/AX2INxmbwuPLrhxQcd8+07HqY3YM2AWhbJ9SrZ2kASUsfSBWU1WYP3ZGhvpOx3H+2T QkVsZ4zRNP5gEg6oSrk73QHiSGnas6mpYeq7qo1fp/i0ap7ZqE8o4waHjw7L85TGF/wd biPKbZ6i66g4JVZ6ZaMtLkD4gP0EfXvY5XQW/I9OrlOMXr7yKZFm80nMAshnGj6RHD5p vuZ2W2RDrwsi1BuNB550YiNR0OF4sRqSF4Ro56Iglx5kyfJpIBj4xC4SgkoIUZEwj3Ig Rvew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=P+ArUfu7KFijl6muxgVmT9H4zzXFEpCZ/1mv/6B2SoA=; b=yNhTQbotVMHz3Nhcb1lmbdmqAdKLDa53frEEoCPvu2DLDmKpv54fJ10N9Bk2iTY8+g SV0GbwmDzdUt7oa4sJmIEgJW+lo0oNFPMn9CD4/gmAGGWyJOE52tSjk/otbKlGY06Bi3 rOHi4vR1V/U9o0WZhfY7yBk7Duqs1EYv3GSNPhr5D80HU1xg9s0F1vi/YWDmHmknEbBb Iq/kp5Lv8P/Xyi5Nykm8bOeho42Q8KDAflyMeVrZyOTaWprnzra9sVt296Pw+ljat6AP px8Tlyb7J82VHMSY96fdY+M6THRP/oUAk8mxMQSy+jVTJpfrjewmlw+nP/5+Ll9Siree jyNg== X-Gm-Message-State: AFqh2kpQWOcg4bEpdrm6z66CMSl3T6Pt8FHS4IvCqz4fzyyDgSwnSX9p /AdFo7fjr9DxGXp12zdWxGZyzDpblJmQ/TfyBqXdRzDB X-Received: by 2002:a05:6808:2001:b0:35b:d93f:cbc4 with SMTP id q1-20020a056808200100b0035bd93fcbc4mr2777486oiw.96.1672755455331; Tue, 03 Jan 2023 06:17:35 -0800 (PST) MIME-Version: 1.0 References: <20221228163102.468-1-mario.limonciello@amd.com> In-Reply-To: From: Alex Deucher Date: Tue, 3 Jan 2023 09:17:24 -0500 Message-ID: Subject: Re: [PATCH v2 00/11] Recover from failure to probe GPU To: "Lazar, Lijo" Cc: Mario Limonciello , Javier Martinez Canillas , Alex Deucher , linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Daniel Vetter , Carlos Soriano Sanchez , David Airlie , christian.koenig@amd.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 3, 2023 at 5:10 AM Lazar, Lijo wrote: > > > > On 12/28/2022 10:00 PM, Mario Limonciello wrote: > > One of the first thing that KMS drivers do during initialization is > > destroy the system firmware framebuffer by means of > > `drm_aperture_remove_conflicting_pci_framebuffers` > > > > This means that if for any reason the GPU failed to probe the user > > will be stuck with at best a screen frozen at the last thing that > > was shown before the KMS driver continued it's probe. > > > > The problem is most pronounced when new GPU support is introduced > > because users will need to have a recent linux-firmware snapshot > > on their system when they boot a kernel with matching support. > > > > However the problem is further exaggerated in the case of amdgpu because > > it has migrated to "IP discovery" where amdgpu will attempt to load > > on "ALL" AMD GPUs even if the driver is missing support for IP blocks > > contained in that GPU. > > > > IP discovery requires some probing and isn't run until after the > > framebuffer has been destroyed. > > > > This means a situation can occur where a user purchases a new GPU not > > yet supported by a distribution and when booting the installer it will > > "freeze" even if the distribution doesn't have the matching kernel support > > for those IP blocks. > > > > The perfect example of this is Ubuntu 22.10 and the new dGPUs just > > launched by AMD. The installation media ships with kernel 5.19 (which > > has IP discovery) but the amdgpu support for those IP blocks landed in > > kernel 6.0. The matching linux-firmware was released after 22.10's launch. > > The screen will freeze without nomodeset. Even if a user manages to install > > and then upgrades to kernel 6.0 after install they'll still have the > > problem of missing firmware, and the same experience. > > > > This is quite jarring for users, particularly if they don't know > > that they have to use "nomodeset" to install. > > > > To help the situation make changes to GPU discovery: > > 1) Delay releasing the firmware framebuffer until after IP discovery has > > completed. This will help the situation of an older kernel that doesn't > > yet support the IP blocks probing a new GPU. > > 2) Request loading all PSP, VCN, SDMA, MES and GC microcode into memory > > during IP discovery. This will help the situation of new enough kernel for > > the IP discovery phase to otherwise pass but missing microcode from > > linux-firmware.git. > > > > Not all requested firmware will be loaded during IP discovery as some of it > > will require larger driver architecture changes. For example SMU firmware > > isn't loaded on certain products, but that's not known until later on when > > the early_init phase of the SMU load occurs. > > > > v1->v2: > > * Take the suggestion from v1 thread to delay the framebuffer release until > > ip discovery is done. This patch is CC to stable to that older stable > > kernels with IP discovery won't try to probe unknown IP. > > * Drop changes to drm aperature. > > * Fetch SDMA, VCN, MES, GC and PSP microcode during IP discovery. > > > > What is the gain here in just checking if firmware files are available? > It can fail anywhere during sw_init and it's the same situation. Other failures are presumably a bug or hardware issue. The missing firmware would be a common issue when chips are first launched. Thinking about it a bit more, another option might be to move the calls to request_firmware() into the IP specific early_init() functions and then move the drm_aperture release after early_init(). That would keep the firmware handling in the IPs and should still happen early enough that we haven't messed with the hardware yet. Alex > > Restricting IP FWs to IP specific files looks better to me than > centralizing and creating interdependencies. > > Thanks, > Lijo > > > Mario Limonciello (11): > > drm/amd: Delay removal of the firmware framebuffer > > drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode" > > drm/amd: Convert SMUv11 microcode init to use > > `amdgpu_ucode_ip_version_decode` > > drm/amd: Convert SMU v13 to use `amdgpu_ucode_ip_version_decode` > > drm/amd: Request SDMA microcode during IP discovery > > drm/amd: Request VCN microcode during IP discovery > > drm/amd: Request MES microcode during IP discovery > > drm/amd: Request GFX9 microcode during IP discovery > > drm/amd: Request GFX10 microcode during IP discovery > > drm/amd: Request GFX11 microcode during IP discovery > > drm/amd: Request PSP microcode during IP discovery > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 590 +++++++++++++++++- > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 - > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 - > > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 9 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 208 ++++++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 85 +-- > > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 180 +----- > > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 64 +- > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 143 +---- > > drivers/gpu/drm/amd/amdgpu/mes_v10_1.c | 28 - > > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 25 +- > > drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 106 +--- > > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 165 +---- > > drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 102 +-- > > drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 82 --- > > drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c | 36 -- > > drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 36 -- > > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 61 +- > > drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 42 +- > > drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 65 +- > > drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 30 +- > > .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 35 +- > > .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 12 +- > > 25 files changed, 919 insertions(+), 1203 deletions(-) > > > > > > base-commit: de9a71e391a92841582ca3008e7b127a0b8ccf41