Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp1179277rdb; Fri, 1 Dec 2023 08:56:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IGyeh+YZ1dwS5gzA1/SDAuNV1yAi8YHaAG5xUZUBKzkzqnxE3Xbs+u4kLwaB2SOU2V6/d8C X-Received: by 2002:a05:6a21:a58d:b0:18b:d99a:99bd with SMTP id gd13-20020a056a21a58d00b0018bd99a99bdmr30255969pzc.32.1701449776243; Fri, 01 Dec 2023 08:56:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701449776; cv=none; d=google.com; s=arc-20160816; b=pR1i37KIQF+JUwmFiPdzTyFF98llgonfLyGLgMlb73ZXpRuHDNWuKjIBA2zqf4fGvx P3wk9mmypm+wpYFgZB0nCEYiXM6Q8p8uRch1ZGEtmTlCHkJHvKR/rn+yr48jMafFVH5r R8hzP05DziZdVAI1a8ULrpRFiS0MyLyVz2qfulUvBAp1g5FAO3PxSwRE982G8BrzBXwg LUNl3iMBkMmf5rwlJNT+wbuqAYUNmRCmOtS+dA+X6S2z+/yCxdWUnoHGDbuhyXbzugy2 a/xQ3d/hBsC7YM4oCf7BFQs8gi2LkoeEKnptcW7UtrZOR2oR4YE6F+FMBAG8AwQD1mwu yKig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=NWhV56djU95aQfLVNm7Soh/hXtQRCsN2ft0VIHAER8A=; fh=y5c+niPowbDSb6V1J6x0TFyLA7BXYBNf4M1OLoZlGwo=; b=O3wrNUIISHGOKH/1pwMWX5DkTQri+YCK2zqsNzEBq/Rg4lPeVwPqyiHYqdIrO9x/ld T/zOOrnYbcH3bXRDv5zImGJE5p7gzDqIo6iUrTvFZ4SCFL+V1XW5qJzpEtbwzQLBsDH8 xHRzeV6csfFdG7uHbzs1I/5cEG6iy4XMHZRmRoSaAvj7Pw5vRRo03c+AGpTvCkrCvHUp JiRcfeQlhJrbRg58W3nJAoYxzi8F1sf+Zuop7ke95Q7iLelWzcDD2VD+MZYHsFOijZFh lh/LgWKRCsU/84HVl920pdJBOgsCfgdX3g9I2Qzl1JG2iay/qRd0tB/1SAVnCjd+E7Xj 3onQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hEnhUvJT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id k79-20020a628452000000b006cbed8d05e9si3527722pfd.357.2023.12.01.08.56.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 08:56:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hEnhUvJT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 4E02D8271758; Fri, 1 Dec 2023 08:56:00 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229515AbjLAQzq (ORCPT + 99 others); Fri, 1 Dec 2023 11:55:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbjLAQzo (ORCPT ); Fri, 1 Dec 2023 11:55:44 -0500 Received: from mail-oo1-xc2c.google.com (mail-oo1-xc2c.google.com [IPv6:2607:f8b0:4864:20::c2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7ED6712A for ; Fri, 1 Dec 2023 08:55:49 -0800 (PST) Received: by mail-oo1-xc2c.google.com with SMTP id 006d021491bc7-58cec5943c1so1266960eaf.0 for ; Fri, 01 Dec 2023 08:55:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701449749; x=1702054549; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NWhV56djU95aQfLVNm7Soh/hXtQRCsN2ft0VIHAER8A=; b=hEnhUvJTMgU3lra1zgbntzQWX7QMeKeXk7mwyJ20WTHqaLSwW2xvdN+llAYFY68i7j RfqZ+99/1KSkPjQ1AhOcLWt6IUlnMBGEXDA7SXUKqslG8hzIUPNSZEjQSU93D6BHohVW 3cqaxE3iZgOaZZydWwI6P7nFBev17n6F+No1nr8ZxuzPIS0wXQweiRD5QalIbxja9dXP rIVBjN7NbtMOqDRA/SYNQsBpQ/d0UXDAR9phOPjK1CuPh0gJv8D2k0ubTTK0uPgzCVts Y8dcdENujY8qKLb7PA2YLQhAlq0cMaqt8F4T2SDci/WHjW0bUD2sfqFjc2HecG0Y5ar2 yQ9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701449749; x=1702054549; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NWhV56djU95aQfLVNm7Soh/hXtQRCsN2ft0VIHAER8A=; b=FhtuJgjxBnrjYqL+6d1rlRCMPR879OXH/cwfEVS/yg59D7EBYIj257pQXoj6nktVDx W8mlwE07dnugtiSVuTR0CJpY+Yl2mLGf6Y3hwIfMrRUpfCXmoWxpqFT66rmTIhHzuIl3 Oaxco4CBc3yf26bqG27NIwJvZFIQy6oT4hSFLMhTwGawsM91+1PpuplfbD7ZIOzQi5MX GafPGETNyvOw+O/sxZKR4HKk25UsuLOSnlptp6UPeIdmJYk6WUxWVqOxjd+402pc/f9e IehaMrxNVJyCh7fH3KQcD5po7TBhFkDInAsvEV+BQNCR9NFhSPDlEjFMqdsS6k5Cuuv8 bEcg== X-Gm-Message-State: AOJu0Yx1EXY9KIetkBTpNnRCay4OoCjSm4ujbW1UUJNjKDch3j32vpAm JWDOFpj3Ws0oEuCDCprWfxL/eqQLGA0Ft4XpnX4= X-Received: by 2002:a05:6870:2182:b0:1f9:8f1b:86f7 with SMTP id l2-20020a056870218200b001f98f1b86f7mr29208291oae.42.1701449748697; Fri, 01 Dec 2023 08:55:48 -0800 (PST) MIME-Version: 1.0 References: <87edgv4x3i.fsf@vps.thesusis.net> <559d0fa5-953a-4a97-b03b-5eb1287c83d8@leemhuis.info> <96e2e13c-f01c-4baf-a9a3-cbaa48fb10c7@amd.com> <87jzq2ixtm.fsf@vps.thesusis.net> <95fe9b5b-05ce-4462-9973-9aca306bc44f@gmail.com> <9595b8bf-e64d-4926-9263-97e18bcd7d05@gmail.com> In-Reply-To: From: Alex Deucher Date: Fri, 1 Dec 2023 11:55:36 -0500 Message-ID: Subject: Re: Radeon regression in 6.6 kernel To: Luben Tuikov Cc: Phillip Susi , Linux regressions mailing list , =?UTF-8?Q?Christian_K=C3=B6nig?= , linux-kernel@vger.kernel.org, "amd-gfx@lists.freedesktop.org" , dri-devel@lists.freedesktop.org, Alex Deucher , =?UTF-8?Q?Christian_K=C3=B6nig?= , Danilo Krummrich Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 01 Dec 2023 08:56:00 -0800 (PST) Phillip, Can you test this patch? I was not able to repro the issue on the navi2x card I had handy, but I think it should fix it. Thanks, Alex On Wed, Nov 29, 2023 at 3:49=E2=80=AFPM Alex Deucher wrote: > > On Wed, Nov 29, 2023 at 3:10=E2=80=AFPM Alex Deucher wrote: > > > > Actually I think I see the problem. I'll try and send out a patch > > later today to test. > > Does the attached patch fix it? > > Alex > > > > > Alex > > > > On Wed, Nov 29, 2023 at 1:52=E2=80=AFPM Alex Deucher wrote: > > > > > > On Wed, Nov 29, 2023 at 11:41=E2=80=AFAM Luben Tuikov wrote: > > > > > > > > On 2023-11-29 10:22, Alex Deucher wrote: > > > > > On Wed, Nov 29, 2023 at 8:50=E2=80=AFAM Alex Deucher wrote: > > > > >> > > > > >> On Tue, Nov 28, 2023 at 11:45=E2=80=AFPM Luben Tuikov wrote: > > > > >>> > > > > >>> On 2023-11-28 17:13, Alex Deucher wrote: > > > > >>>> On Mon, Nov 27, 2023 at 6:24=E2=80=AFPM Phillip Susi wrote: > > > > >>>>> > > > > >>>>> Alex Deucher writes: > > > > >>>>> > > > > >>>>>>> In that case those are the already known problems with the = scheduler > > > > >>>>>>> changes, aren't they? > > > > >>>>>> > > > > >>>>>> Yes. Those changes went into 6.7 though, not 6.6 AFAIK. Ma= ybe I'm > > > > >>>>>> misunderstanding what the original report was actually testi= ng. If it > > > > >>>>>> was 6.7, then try reverting: > > > > >>>>>> 56e449603f0ac580700621a356d35d5716a62ce5 > > > > >>>>>> b70438004a14f4d0f9890b3297cd66248728546c > > > > >>>>> > > > > >>>>> At some point it was suggested that I file a gitlab issue, bu= t I took > > > > >>>>> this to mean it was already known and being worked on. -rc3 = came out > > > > >>>>> today and still has the problem. Is there a known issue I co= uld track? > > > > >>>>> > > > > >>>> > > > > >>>> At this point, unless there are any objections, I think we sho= uld just > > > > >>>> revert the two patches > > > > >>> Uhm, no. > > > > >>> > > > > >>> Why "the two" patches? > > > > >>> > > > > >>> This email, part of this thread, > > > > >>> > > > > >>> https://lore.kernel.org/all/87r0kircdo.fsf@vps.thesusis.net/ > > > > >>> > > > > >>> clearly states that reverting *only* this commit, > > > > >>> 56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable= number of run-queues > > > > >>> *does not* mitigate the failed suspend. (Furthermore, this comm= it doesn't really change > > > > >>> anything operational, other than using an allocated array, inst= ead of a static one, in DRM, > > > > >>> while the 2nd patch is solely contained within the amdgpu drive= r code.) > > > > >>> > > > > >>> Leaving us with only this change, > > > > >>> b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level > > > > >>> to be at fault, as the kernel log attached in the linked email = above shows. > > > > >>> > > > > >>> The conclusion is that only b70438004a14f4 needs reverting. > > > > >> > > > > >> b70438004a14f4 was a fix for 56e449603f0ac5. Without b70438004a= 14f4, > > > > >> 56e449603f0ac5 breaks amdgpu. > > > > > > > > > > We can try and re-enable it in the next kernel. I'm just not sur= e > > > > > we'll be able to fix this in time for 6.7 with the holidays and a= ll > > > > > and I don't want to cause a lot of scheduler churn at the end of = the > > > > > 6.7 cycle if we hold off and try and fix it. Reverting seems lik= e the > > > > > best short term solution. > > > > > > > > A lot of subsequent code has come in since commit 56e449603f0ac5, a= s it opened > > > > the opportunity for a 1-to-1 relationship between an entity and a s= cheduler. > > > > (Should've always been the case, from the outset. Not sure why it w= as coded as > > > > a fixed-size array.) > > > > > > > > Given that commit 56e449603f0ac5 has nothing to do with amdgpu, and= the problem > > > > is wholly contained in amdgpu, and no other driver has this problem= , there is > > > > no reason to have to "churn", i.e. go back and forth in DRM, only t= o cover up > > > > an init bug in amdgpu. See the response I just sent in @this thread= : > > > > https://lore.kernel.org/r/05007cb0-871e-4dc7-af58-1351f4ba43e2@gmai= l.com > > > > > > > > And it's not like this issue is unknown. I first posted about it on= 2023-10-16. > > > > > > > > Ideally, amdgpu would just fix their init code. > > > > > > You can't make changes to core code that break other drivers. > > > Arguably 56e449603f0ac5 should not have gone in in the first place if > > > it broke amdgpu. b70438004a14f4 was the code to fix amdgpu's init > > > code, but as a side effect it seems to have broken suspend for some > > > users. > > > > > > Alex