Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4841005rdh; Wed, 29 Nov 2023 12:10:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IHInJkTeQRCbloHh7iCNP8vJQUXdd34+BBDTOGanvG7KtZ8cxorZMhO3jwcHL3PUnSfF+gh X-Received: by 2002:a05:6a00:3989:b0:693:3d55:2bb3 with SMTP id fi9-20020a056a00398900b006933d552bb3mr20819496pfb.26.1701288647680; Wed, 29 Nov 2023 12:10:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701288647; cv=none; d=google.com; s=arc-20160816; b=wC1LqbLjQLzTfFVEXaYXVvc+lzIb5dCbvYuhn9KxpD8g3lhjnOCgVxi4/9m95RfR5i /OqznXchoSGGu0fcKecSV2dv+n/Xh5tt14wKwxRRPmw+YGh9XN/cn/JUcoScfSUQYfPZ B2QHjGyXVAqjNDTy7Uc48xrQuLgF0bzKNJsreJKx53FKKb4kvHiyAaXtvn6Tj9xNJoO6 a2LF8mjPH4tviX+d9/pSCLnnnFleY+FWn7fdphUQGhlmG4ZinWb0g0ROL0BOPyfXo/pU QP8+dBzQKZVRYnF6Q6dqXf6G3I01EhTY075nv9X0KPJHK3uO3KY4nES6oSlW2LOcI6pu dFtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=kY1GN59hmbEJaNOpwQAxi652Fp3hiuaW7jARiW8zew8=; fh=y5c+niPowbDSb6V1J6x0TFyLA7BXYBNf4M1OLoZlGwo=; b=oxkOyrzCIZUFDKHKpsrS347CPjHRDfADyOsleI+Zf2P5Mu658NzcVaUjz9whS9pC0O QmCOC+XEjd4y+Tp7JbJlT4+tyzWV1YH3bap5yhpK8O+NVBdD9ZtksuJ4aj0BtbCf+8Gi OZazKitA9bwndoEFADlSJmQaKopwWUb6o0w9JWjE6sDYKaV0vqXQxKMoQx47bC12zoqy pTSQ2gm1Qn77TRB2MKXtdM4kxYF983itLBvKkifHjig0/57yCOYHlcql5aJbT63MEKhb wTISPLD9fa4q6/gobT6srhRu4ljwHCC/CwtRlyp50uckgLsQCt5ASqsilqnlpBJHMs0C vAWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=G3oAsHy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id l190-20020a6325c7000000b0056f7f18bbfdsi15104135pgl.632.2023.11.29.12.10.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 12:10:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=G3oAsHy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id C3C97805AA3E; Wed, 29 Nov 2023 12:10:44 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233836AbjK2UK0 (ORCPT + 99 others); Wed, 29 Nov 2023 15:10:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233850AbjK2UKX (ORCPT ); Wed, 29 Nov 2023 15:10:23 -0500 Received: from mail-oa1-x2d.google.com (mail-oa1-x2d.google.com [IPv6:2001:4860:4864:20::2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DB1AD7F for ; Wed, 29 Nov 2023 12:10:26 -0800 (PST) Received: by mail-oa1-x2d.google.com with SMTP id 586e51a60fabf-1fa20395185so47302fac.3 for ; Wed, 29 Nov 2023 12:10:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701288625; x=1701893425; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kY1GN59hmbEJaNOpwQAxi652Fp3hiuaW7jARiW8zew8=; b=G3oAsHy6cPVbStp9mepdonDPLQuz27eiQTY/P1FPmYAsMNygWDlAuOImCOeq7QNMxm sfmeDBnfizUvYjl9OaL4zT599v6TYZqwTw8iQ6ITZmVbLl52w5FliRF5hKZSFFFBDpN7 uj9uK/i8M+7JURmoMOkVt17ttcmkWtGV5FWTIG3r+oF5MwMcVHdOMrqA2x8KSNPC26HM kFKeoEkrg3/rFxt/lLmIdqpK6PIzepDx8bdzQ9x6OPKJUMvMrMtc7lPYPb3U5Gcvq3hA bj33jBQv26MV/Pp9z+iJRBqaIW8KKbt9zm4EfN5xN3nnUgXJCEPFTgo6ZuKYT1HBz5Dv FClA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701288625; x=1701893425; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kY1GN59hmbEJaNOpwQAxi652Fp3hiuaW7jARiW8zew8=; b=J3dKIF639bJG/KFuDvyrJcAqA/IyC2wVyPNjzLViahYpbQ/57gWB2wWw7GS4uDhPO7 USuOs3zGJDH558lmd2H2hSNVenloHgrZG6NxrFH73JU3rcCY4Lh/0Ri0RzGzpgHwCE3T 8/3Fiu4wrIsLDiNSb1nUnKU7XiyQ3vYpxSadKUREN+KnoyLljPr7nF/mjiwZZQU+Bmai Ted5zFDhCIohiNEW883gkCLGLySKIObzN1dIRV5vdLOK9EPfMKOcCIefwvHlgihuTJja ej+/uFwZKGwqFaC4jgCwupnyvNBHFmqea7BubnrNQBZ15jX/Flww1k+51MfFukiC6S2F H9dg== X-Gm-Message-State: AOJu0YxfwHCyU4lyVvBYdPBu27Og2LMRVWp21LnyiTRiClV1mbKFdFsc V30ea8EHuXxd51rvRiiSow4Fh/WDs9e5Y56y9/zoslF8 X-Received: by 2002:a05:6870:2248:b0:1fa:a6d4:3b10 with SMTP id j8-20020a056870224800b001faa6d43b10mr3117415oaf.50.1701288625588; Wed, 29 Nov 2023 12:10:25 -0800 (PST) MIME-Version: 1.0 References: <87edgv4x3i.fsf@vps.thesusis.net> <559d0fa5-953a-4a97-b03b-5eb1287c83d8@leemhuis.info> <96e2e13c-f01c-4baf-a9a3-cbaa48fb10c7@amd.com> <87jzq2ixtm.fsf@vps.thesusis.net> <95fe9b5b-05ce-4462-9973-9aca306bc44f@gmail.com> <9595b8bf-e64d-4926-9263-97e18bcd7d05@gmail.com> In-Reply-To: From: Alex Deucher Date: Wed, 29 Nov 2023 15:10:14 -0500 Message-ID: Subject: Re: Radeon regression in 6.6 kernel To: Luben Tuikov Cc: Phillip Susi , Linux regressions mailing list , =?UTF-8?Q?Christian_K=C3=B6nig?= , linux-kernel@vger.kernel.org, "amd-gfx@lists.freedesktop.org" , dri-devel@lists.freedesktop.org, Alex Deucher , =?UTF-8?Q?Christian_K=C3=B6nig?= , Danilo Krummrich Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 29 Nov 2023 12:10:44 -0800 (PST) Actually I think I see the problem. I'll try and send out a patch later today to test. Alex On Wed, Nov 29, 2023 at 1:52=E2=80=AFPM Alex Deucher wrote: > > On Wed, Nov 29, 2023 at 11:41=E2=80=AFAM Luben Tuikov wrote: > > > > On 2023-11-29 10:22, Alex Deucher wrote: > > > On Wed, Nov 29, 2023 at 8:50=E2=80=AFAM Alex Deucher wrote: > > >> > > >> On Tue, Nov 28, 2023 at 11:45=E2=80=AFPM Luben Tuikov wrote: > > >>> > > >>> On 2023-11-28 17:13, Alex Deucher wrote: > > >>>> On Mon, Nov 27, 2023 at 6:24=E2=80=AFPM Phillip Susi wrote: > > >>>>> > > >>>>> Alex Deucher writes: > > >>>>> > > >>>>>>> In that case those are the already known problems with the sche= duler > > >>>>>>> changes, aren't they? > > >>>>>> > > >>>>>> Yes. Those changes went into 6.7 though, not 6.6 AFAIK. Maybe = I'm > > >>>>>> misunderstanding what the original report was actually testing. = If it > > >>>>>> was 6.7, then try reverting: > > >>>>>> 56e449603f0ac580700621a356d35d5716a62ce5 > > >>>>>> b70438004a14f4d0f9890b3297cd66248728546c > > >>>>> > > >>>>> At some point it was suggested that I file a gitlab issue, but I = took > > >>>>> this to mean it was already known and being worked on. -rc3 came= out > > >>>>> today and still has the problem. Is there a known issue I could = track? > > >>>>> > > >>>> > > >>>> At this point, unless there are any objections, I think we should = just > > >>>> revert the two patches > > >>> Uhm, no. > > >>> > > >>> Why "the two" patches? > > >>> > > >>> This email, part of this thread, > > >>> > > >>> https://lore.kernel.org/all/87r0kircdo.fsf@vps.thesusis.net/ > > >>> > > >>> clearly states that reverting *only* this commit, > > >>> 56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable num= ber of run-queues > > >>> *does not* mitigate the failed suspend. (Furthermore, this commit d= oesn't really change > > >>> anything operational, other than using an allocated array, instead = of a static one, in DRM, > > >>> while the 2nd patch is solely contained within the amdgpu driver co= de.) > > >>> > > >>> Leaving us with only this change, > > >>> b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level > > >>> to be at fault, as the kernel log attached in the linked email abov= e shows. > > >>> > > >>> The conclusion is that only b70438004a14f4 needs reverting. > > >> > > >> b70438004a14f4 was a fix for 56e449603f0ac5. Without b70438004a14f4= , > > >> 56e449603f0ac5 breaks amdgpu. > > > > > > We can try and re-enable it in the next kernel. I'm just not sure > > > we'll be able to fix this in time for 6.7 with the holidays and all > > > and I don't want to cause a lot of scheduler churn at the end of the > > > 6.7 cycle if we hold off and try and fix it. Reverting seems like th= e > > > best short term solution. > > > > A lot of subsequent code has come in since commit 56e449603f0ac5, as it= opened > > the opportunity for a 1-to-1 relationship between an entity and a sched= uler. > > (Should've always been the case, from the outset. Not sure why it was c= oded as > > a fixed-size array.) > > > > Given that commit 56e449603f0ac5 has nothing to do with amdgpu, and the= problem > > is wholly contained in amdgpu, and no other driver has this problem, th= ere is > > no reason to have to "churn", i.e. go back and forth in DRM, only to co= ver up > > an init bug in amdgpu. See the response I just sent in @this thread: > > https://lore.kernel.org/r/05007cb0-871e-4dc7-af58-1351f4ba43e2@gmail.co= m > > > > And it's not like this issue is unknown. I first posted about it on 202= 3-10-16. > > > > Ideally, amdgpu would just fix their init code. > > You can't make changes to core code that break other drivers. > Arguably 56e449603f0ac5 should not have gone in in the first place if > it broke amdgpu. b70438004a14f4 was the code to fix amdgpu's init > code, but as a side effect it seems to have broken suspend for some > users. > > Alex