Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4701752rdh; Wed, 29 Nov 2023 08:21:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IFmEAU58p62Ry5oy7sQTPYE7lrcJGyK3nqz2Hq+NTX4NhoOg4C5fL2cAtYhTj0PyV03/6EF X-Received: by 2002:a17:902:e752:b0:1cf:b5ae:2817 with SMTP id p18-20020a170902e75200b001cfb5ae2817mr16522691plf.54.1701274886595; Wed, 29 Nov 2023 08:21:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701274886; cv=none; d=google.com; s=arc-20160816; b=miIT1GV/spDquWtKrb7iEJ/5hfDKrvvwXVeLiDuLOs8PyXLD9tRqCiMNOP/o9ygppf FvSi6cUS3HQGEz7FBcgBtmT3aSkV+UGOkuMrtwgwv+zhBT0Ceq4M/rgYZYihI6zUnttk pdHOgbxWAK40mL3zQ/3W/LwgLU0/J/PvL5NIbbxzedCY3DBaOK9A9KMI1sP/Xlt+nbnN lAAg12sjhglXAG0sysE39mKBNRDrfYxFKUCoaJBcTMmL1YHWbFZ6X7SHF7MJC3iulBWA bvIOpCOzbErxeCxUTakuqziT3eNSS9Wkw0K5/mFwO9b4h9Qw8t/mqCi20rYPbUBJ8gdF 5hzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:autocrypt:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=5C0iYWHlIaLUE5Bo19jtvrWCFWmST23+Bi2pSXzdUIU=; fh=78Aic9AYxF4DDkLbkvQnS0rnKh2F6QPQW+M2wzq5X88=; b=PNAFVIw4Fwx5QdnwtUxF7X2dHEcI+a93fYRK1XP++jaRM2kjxRfo6kTGmJ5/3eXItu a5U5ZNhhdQUmsAoAEZd+Iy+d6tdsBY5u0Xjx8+/BuMl9W9LyNIF1cCBqwegYP1ZifOR9 mtc6SmodV/jkT4rlfgpV4q4b+voKp65yu6rqpQpKLo1ThWjIHEg9qMavYotcqva7dvYE nKAu3pFgfF6Q1h/L5jrZEGPzl1eQ7hOqHk9TTvMXUGzhHSBrOTCsd35nQwJ1HmekIoco WSki/CZiKKlpF1A9Ko0YPidtk77kHd/iq/IleeiQJbgvKu4pTBNCaQr/TcSsG+AFfWpl ry+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=NOlLmrLY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id q5-20020a170902dac500b001c0cb378f04si5208004plx.335.2023.11.29.08.21.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 08:21:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=NOlLmrLY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 565D680558F6; Wed, 29 Nov 2023 08:21:23 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229488AbjK2QVF (ORCPT + 99 others); Wed, 29 Nov 2023 11:21:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230192AbjK2QVE (ORCPT ); Wed, 29 Nov 2023 11:21:04 -0500 Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30653D7F for ; Wed, 29 Nov 2023 08:21:09 -0800 (PST) Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-67a18556e4aso34475996d6.1 for ; Wed, 29 Nov 2023 08:21:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701274868; x=1701879668; darn=vger.kernel.org; h=in-reply-to:autocrypt:from:content-language:references:cc:to :subject:user-agent:mime-version:date:message-id:from:to:cc:subject :date:message-id:reply-to; bh=5C0iYWHlIaLUE5Bo19jtvrWCFWmST23+Bi2pSXzdUIU=; b=NOlLmrLYD+RMqO8wON+ZlUibo2gr/8Lm6+ZbnPs+zfixLwrdYZx+4j+rZ4fB37AjmO Jyr6MQewLJpls5gkNOdu+X2q5MgmXinT98ofMLDkwd42UVlytiuQny+Y08W7qvd/4GHT juo0uxhAsS9kX0wXfCvXTF5OWgWmguYI6dQKRS2TH8EAwmdntxJr9AItWgIlm2FbVYoh 7fSar02GSfuausNflAcIWI031qjHJBPkhkiy4dhM2nMMIFwAgxETnWwP6sO0xTkyx1MN ngULMCkHdOcD3bhBGkynO3n+EJSvDlve/2TFkvH7ulD1luDGW4BK283qv3RSXA8cZ88P xQKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701274868; x=1701879668; h=in-reply-to:autocrypt:from:content-language:references:cc:to :subject:user-agent:mime-version:date:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5C0iYWHlIaLUE5Bo19jtvrWCFWmST23+Bi2pSXzdUIU=; b=qK9p3Qnd3tRMevTSUZeYK7EOydNyTdBCAMMsNPRU7QwSV1E5nF/nfXY/mnDnz+Kcsb TlO0heUd7crclOdFKokdFvFjEoT1jtrZcRaBINdzLpaF8ox3nUNXQMjmiWXflwT6Gyfr uM/KmoSusKFRNyv7GPG7w/s1yF5KoZalXyzFv0Xy2499Xe2MPwxMeRuA6/XY2fgrLvSt tl7ZTJA6GkcWgImBjbyOj95wZNhbwVu45w8CqwJij8sUfLdmTXjmHpZfKmvSDGcUL4D2 w50NyPILqVkFyceQZ7D0O1C6vCJgz7/z8RlGw4mIi9OAEKhBQdo+SuI+IyKa7qFflDQx 0IgA== X-Gm-Message-State: AOJu0YxBME4YXQ8eXxYwrZalniuImvLm/Lp+FuJUk/4+RmvJmKmyrvY2 l6uvcY+Ko2pzE68izxyLtwk= X-Received: by 2002:a0c:e7ce:0:b0:67a:3e70:3e3d with SMTP id c14-20020a0ce7ce000000b0067a3e703e3dmr12014239qvo.14.1701274868197; Wed, 29 Nov 2023 08:21:08 -0800 (PST) Received: from [192.168.2.14] ([76.65.20.140]) by smtp.gmail.com with ESMTPSA id cj19-20020a05622a259300b00423ec241c54sm620462qtb.47.2023.11.29.08.21.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Nov 2023 08:21:07 -0800 (PST) Message-ID: <05007cb0-871e-4dc7-af58-1351f4ba43e2@gmail.com> Date: Wed, 29 Nov 2023 11:20:58 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:115.0) Gecko/20100101 Thunderbird/115.5.1 Subject: Re: Radeon regression in 6.6 kernel To: Alex Deucher Cc: Phillip Susi , Linux regressions mailing list , =?UTF-8?Q?Christian_K=C3=B6nig?= , linux-kernel@vger.kernel.org, "amd-gfx@lists.freedesktop.org" , dri-devel@lists.freedesktop.org, Alex Deucher , =?UTF-8?Q?Christian_K=C3=B6nig?= , Danilo Krummrich References: <87edgv4x3i.fsf@vps.thesusis.net> <559d0fa5-953a-4a97-b03b-5eb1287c83d8@leemhuis.info> <96e2e13c-f01c-4baf-a9a3-cbaa48fb10c7@amd.com> <87jzq2ixtm.fsf@vps.thesusis.net> <95fe9b5b-05ce-4462-9973-9aca306bc44f@gmail.com> Content-Language: en-CA, en-US From: Luben Tuikov Autocrypt: addr=ltuikov89@gmail.com; keydata= xjMEZTohOhYJKwYBBAHaRw8BAQdAWSq76k+GsENjDTMVCy9Vr4fAO9Rb57/bPT1APnbnnRHN Ikx1YmVuIFR1aWtvdiA8bHR1aWtvdjg5QGdtYWlsLmNvbT7CmQQTFgoAQRYhBJkj7+VmFO9b eaAl10wVR5QxozSvBQJlOiE6AhsDBQkJZgGABQsJCAcCAiICBhUKCQgLAgQWAgMBAh4HAheA AAoJEEwVR5QxozSvSm4BAOwCpX53DTQhE20FBGlTMqKCOQyJqlMcIQ9SO1qPWX1iAQCv3vfy JwktF7REl1yt7IU2Sye1qmQMfJxdt9JMbMNNBs44BGU6IToSCisGAQQBl1UBBQEBB0BT9wSP cCE8uGe7FWo8C+nTSyWPXKTx9F0gpEnlqReRBwMBCAfCfgQYFgoAJhYhBJkj7+VmFO9beaAl 10wVR5QxozSvBQJlOiE6AhsMBQkJZgGAAAoJEEwVR5QxozSvSsYA/2LIFjbxQ2ikbU5S0pKo aMDzO9eGz69uNhNWJcvIKJK6AQC9228Mqc1JeZMIyjYWr2HKYHi8S2q2/zHrSZwAWYYwDA== In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------j7wTypPcX4sW4ivkzQYkyKmq" X-Spam-Status: No, score=-2.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 29 Nov 2023 08:21:23 -0800 (PST) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------j7wTypPcX4sW4ivkzQYkyKmq Content-Type: multipart/mixed; boundary="------------jdFCh49Ay0wTOfXIQ5VBNmuF"; protected-headers="v1" From: Luben Tuikov To: Alex Deucher Cc: Phillip Susi , Linux regressions mailing list , =?UTF-8?Q?Christian_K=C3=B6nig?= , linux-kernel@vger.kernel.org, "amd-gfx@lists.freedesktop.org" , dri-devel@lists.freedesktop.org, Alex Deucher , =?UTF-8?Q?Christian_K=C3=B6nig?= , Danilo Krummrich Message-ID: <05007cb0-871e-4dc7-af58-1351f4ba43e2@gmail.com> Subject: Re: Radeon regression in 6.6 kernel References: <87edgv4x3i.fsf@vps.thesusis.net> <559d0fa5-953a-4a97-b03b-5eb1287c83d8@leemhuis.info> <96e2e13c-f01c-4baf-a9a3-cbaa48fb10c7@amd.com> <87jzq2ixtm.fsf@vps.thesusis.net> <95fe9b5b-05ce-4462-9973-9aca306bc44f@gmail.com> In-Reply-To: --------------jdFCh49Ay0wTOfXIQ5VBNmuF Content-Type: multipart/mixed; boundary="------------MuT0iEzPiRE10VeiMpxSflIk" --------------MuT0iEzPiRE10VeiMpxSflIk Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2023-11-29 08:50, Alex Deucher wrote: > On Tue, Nov 28, 2023 at 11:45=E2=80=AFPM Luben Tuikov wrote: >> >> On 2023-11-28 17:13, Alex Deucher wrote: >>> On Mon, Nov 27, 2023 at 6:24=E2=80=AFPM Phillip Susi wrote: >>>> >>>> Alex Deucher writes: >>>> >>>>>> In that case those are the already known problems with the schedul= er >>>>>> changes, aren't they? >>>>> >>>>> Yes. Those changes went into 6.7 though, not 6.6 AFAIK. Maybe I'm= >>>>> misunderstanding what the original report was actually testing. If= it >>>>> was 6.7, then try reverting: >>>>> 56e449603f0ac580700621a356d35d5716a62ce5 >>>>> b70438004a14f4d0f9890b3297cd66248728546c >>>> >>>> At some point it was suggested that I file a gitlab issue, but I too= k >>>> this to mean it was already known and being worked on. -rc3 came ou= t >>>> today and still has the problem. Is there a known issue I could tra= ck? >>>> >>> >>> At this point, unless there are any objections, I think we should jus= t >>> revert the two patches >> Uhm, no. >> >> Why "the two" patches? >> >> This email, part of this thread, >> >> https://lore.kernel.org/all/87r0kircdo.fsf@vps.thesusis.net/ >> >> clearly states that reverting *only* this commit, >> 56e449603f0ac5 drm/sched: Convert the GPU scheduler to variable number= of run-queues >> *does not* mitigate the failed suspend. (Furthermore, this commit does= n't really change >> anything operational, other than using an allocated array, instead of = a static one, in DRM, >> while the 2nd patch is solely contained within the amdgpu driver code.= ) >> >> Leaving us with only this change, >> b70438004a14f4 drm/amdgpu: move buffer funcs setting up a level >> to be at fault, as the kernel log attached in the linked email above s= hows. >> >> The conclusion is that only b70438004a14f4 needs reverting. >=20 > b70438004a14f4 was a fix for 56e449603f0ac5. Without b70438004a14f4, > 56e449603f0ac5 breaks amdgpu. It doesn't "break" it, amdgpu just needs to be fixed. I know we put in a Fixes tag in=20 b70438004a14f4 "drm/amdgpu: move buffer funcs setting up a level" pointing to 56e449603f0ac5 "drm/sched: Convert the GPU scheduler to varia= ble number of run-queues", but given the testing Phillip has done, the culprit is wholly contained i= n the amdgpu driver code. No other driver has this problem since commit 56e449603f0ac5. The Fixes tag in b70438004a14f4 "drm/amdgpu: move buffer funcs setting up= a level" should've ideally pointed to an amdgpu-driver code commit only (perhaps an old-old commit),= and I was a bit uncomfortable putting in a Fixes tag which pointed to drm code, but we did it so that t= he amdgpu commit follows the changes in DRM. In retrospect, the Fixes tag should've pointed to and= amdgpu-driver commit when that the amdgpu code was originally written. I remember that the problem was really that amdgpu called drm_sched_entit= y_init(), in amdgpu_ttm_set_buffer_funcs_status() without actually having initializ= ed the scheduler used therein. For instance, the code before commit b70438004a14f4, looked= like this: void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool = enable) { struct ttm_resource_manager *man =3D ttm_manager_type(&adev->mman.bdev, = TTM_PL_VRAM); uint64_t size; int r; if (!adev->mman.initialized || amdgpu_in_reset(adev) || adev->mman.buffer_funcs_enabled =3D=3D enable) return; if (enable) { struct amdgpu_ring *ring; struct drm_gpu_scheduler *sched; ring =3D adev->mman.buffer_funcs_ring; sched =3D &ring->sched; <-- LT: No one has = initialized this scheduler r =3D drm_sched_entity_init(&adev->mman.entity, <-- Oopses, now tha= t sched->sched_rq is not a static array DRM_SCHED_PRIORITY_KERNEL, &sched, 1, NULL); if (r) { DRM_ERROR("Failed setting up TTM BO move entity (%d)\n", r); return; } Before commit 56e449603f0ac5, amdgpu was getting away with this, because = the sched->sched_rq was a static array. Ideally, amdgpu code would be fixed. --=20 Regards, Luben --------------MuT0iEzPiRE10VeiMpxSflIk Content-Type: application/pgp-keys; name="OpenPGP_0x4C15479431A334AF.asc" Content-Disposition: attachment; filename="OpenPGP_0x4C15479431A334AF.asc" Content-Description: OpenPGP public key Content-Transfer-Encoding: quoted-printable -----BEGIN PGP PUBLIC KEY BLOCK----- xjMEZTohOhYJKwYBBAHaRw8BAQdAWSq76k+GsENjDTMVCy9Vr4fAO9Rb57/bPT1A PnbnnRHNIkx1YmVuIFR1aWtvdiA8bHR1aWtvdjg5QGdtYWlsLmNvbT7CmQQTFgoA QRYhBJkj7+VmFO9beaAl10wVR5QxozSvBQJlOiE6AhsDBQkJZgGABQsJCAcCAiIC BhUKCQgLAgQWAgMBAh4HAheAAAoJEEwVR5QxozSvSm4BAOwCpX53DTQhE20FBGlT MqKCOQyJqlMcIQ9SO1qPWX1iAQCv3vfyJwktF7REl1yt7IU2Sye1qmQMfJxdt9JM bMNNBs44BGU6IToSCisGAQQBl1UBBQEBB0BT9wSPcCE8uGe7FWo8C+nTSyWPXKTx 9F0gpEnlqReRBwMBCAfCfgQYFgoAJhYhBJkj7+VmFO9beaAl10wVR5QxozSvBQJl OiE6AhsMBQkJZgGAAAoJEEwVR5QxozSvSsYA/2LIFjbxQ2ikbU5S0pKoaMDzO9eG z69uNhNWJcvIKJK6AQC9228Mqc1JeZMIyjYWr2HKYHi8S2q2/zHrSZwAWYYwDA=3D=3D =3DqCaZ -----END PGP PUBLIC KEY BLOCK----- --------------MuT0iEzPiRE10VeiMpxSflIk-- --------------jdFCh49Ay0wTOfXIQ5VBNmuF-- --------------j7wTypPcX4sW4ivkzQYkyKmq Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature.asc" -----BEGIN PGP SIGNATURE----- wnsEABYIACMWIQSZI+/lZhTvW3mgJddMFUeUMaM0rwUCZWdk6gUDAAAAAAAKCRBMFUeUMaM0r2hv AP0aSHUC/xllp8MB8hWrrBVNo7NSc2n2uHg5NRzMwqrpfQEA4qwd2RptMLI7lNLpv+dYkg6sgNRJ IGR+myJk9DecGAs= =V4Z6 -----END PGP SIGNATURE----- --------------j7wTypPcX4sW4ivkzQYkyKmq--