2022-10-19 20:43:47

by Deucher, Alexander

[permalink] [raw]
Subject: RE: Linux 6.1-rc1 drm/amdgpu regression

[AMD Official Use Only - General]

> -----Original Message-----
> From: Shuah Khan <[email protected]>
> Sent: Wednesday, October 19, 2022 4:00 PM
> To: Deucher, Alexander <[email protected]>
> Cc: Linus Torvalds <[email protected]>; Shuah Khan
> <[email protected]>; [email protected]
> Subject: Linux 6.1-rc1 drm/amdgpu regression
>
> Hi Alex,
>
> I am seeing the same problem I sent reverts for on 5.10.147 on Linux 6.1-rc1
> on my laptop with AMD Ryzen 7 PRO 5850U with Radeon Graphics.
>
> commit e3163bc8ffdfdb405e10530b140135b2ee487f89
> Author: Alex Deucher <[email protected]>
> Date: Fri Sep 9 11:53:27 2022 -0400
>
> drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
>
> I see that the following has been reverted in Linux 6.1-rc1
>
> commit 66f99628eb24409cb8feb5061f78283c8b65f820
> Author: Hamza Mahfooz <[email protected]>
> Date: Tue Sep 6 15:01:49 2022 -0400
>
> drm/amdgpu: use dirty framebuffer helper
>
> However I still see the following filling dmesg and system is unusable.
> For now I switched back to Linux 6.0 as this is my primary system.
>
> [drm] Fence fallback timer expired on ring sdma0 [drm] Fence fallback timer
> expired on ring gfx [drm] Fence fallback timer expired on ring sdma0 [drm]
> Fence fallback timer expired on ring gfx [drm] Fence fallback timer expired
> on ring sdma0 [drm] Fence fallback timer expired on ring sdma0 [drm] Fence
> fallback timer expired on ring sdma0 [drm] Fence fallback timer expired on
> ring gfx
>
> Please let me know if I should send revert for this for the mainline as well.
>

Can you file a bug report (https://gitlab.freedesktop.org/drm/amd/-/issues) and attach your dmesg output? I'd like to try and repro the issue if I can and provide some patches to test. I'd like to avoid reverting the patch as that will break the driver for users using vega dGPUs. If we revert this patch we'll need to revert the following patches as well to avoid a broken driver for a bunch of AMD GPUs:
dc1d85cb790f2091eea074cee24a704b2d6c4a06
e3163bc8ffdfdb405e10530b140135b2ee487f89
a8671493d2074950553da3cf07d1be43185ef6c6
8795e182b02dc87e343c79e73af6b8b7f9c5e635

Thanks,

Alex


2022-10-19 21:51:52

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 6.1-rc1 drm/amdgpu regression

On 10/19/22 14:27, Deucher, Alexander wrote:
> [AMD Official Use Only - General]
>
>> -----Original Message-----
>> From: Shuah Khan <[email protected]>
>> Sent: Wednesday, October 19, 2022 4:00 PM
>> To: Deucher, Alexander <[email protected]>
>> Cc: Linus Torvalds <[email protected]>; Shuah Khan
>> <[email protected]>; [email protected]
>> Subject: Linux 6.1-rc1 drm/amdgpu regression
>>
>> Hi Alex,
>>
>> I am seeing the same problem I sent reverts for on 5.10.147 on Linux 6.1-rc1
>> on my laptop with AMD Ryzen 7 PRO 5850U with Radeon Graphics.
>>
>> commit e3163bc8ffdfdb405e10530b140135b2ee487f89
>> Author: Alex Deucher <[email protected]>
>> Date: Fri Sep 9 11:53:27 2022 -0400
>>
>> drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
>>
>> I see that the following has been reverted in Linux 6.1-rc1
>>
>> commit 66f99628eb24409cb8feb5061f78283c8b65f820
>> Author: Hamza Mahfooz <[email protected]>
>> Date: Tue Sep 6 15:01:49 2022 -0400
>>
>> drm/amdgpu: use dirty framebuffer helper
>>
>> However I still see the following filling dmesg and system is unusable.
>> For now I switched back to Linux 6.0 as this is my primary system.
>>
>> [drm] Fence fallback timer expired on ring sdma0 [drm] Fence fallback timer
>> expired on ring gfx [drm] Fence fallback timer expired on ring sdma0 [drm]
>> Fence fallback timer expired on ring gfx [drm] Fence fallback timer expired
>> on ring sdma0 [drm] Fence fallback timer expired on ring sdma0 [drm] Fence
>> fallback timer expired on ring sdma0 [drm] Fence fallback timer expired on
>> ring gfx
>>
>> Please let me know if I should send revert for this for the mainline as well.
>>
>
> Can you file a bug report (https://gitlab.freedesktop.org/drm/amd/-/issues) and attach your dmesg output? I'd like to try and repro the issue if I can and provide some patches to test. I'd like to avoid reverting the patch as that will break the driver for users using vega dGPUs.

Makes sense. I will file the bug and aattach dmesg. Since this is my
primary system, there will be some delay in getting this info. to you
and testing any patches you provide for testing.

thanks,
-- Shuah

2022-10-19 21:54:37

by Deucher, Alexander

[permalink] [raw]
Subject: RE: Linux 6.1-rc1 drm/amdgpu regression

[Public]

> -----Original Message-----
> From: Shuah Khan <[email protected]>
> Sent: Wednesday, October 19, 2022 5:00 PM
> To: Deucher, Alexander <[email protected]>
> Cc: Linus Torvalds <[email protected]>; linux-
> [email protected]; Shuah Khan <[email protected]>
> Subject: Re: Linux 6.1-rc1 drm/amdgpu regression
>
> On 10/19/22 14:27, Deucher, Alexander wrote:
> > [AMD Official Use Only - General]
> >
> >> -----Original Message-----
> >> From: Shuah Khan <[email protected]>
> >> Sent: Wednesday, October 19, 2022 4:00 PM
> >> To: Deucher, Alexander <[email protected]>
> >> Cc: Linus Torvalds <[email protected]>; Shuah Khan
> >> <[email protected]>; [email protected]
> >> Subject: Linux 6.1-rc1 drm/amdgpu regression
> >>
> >> Hi Alex,
> >>
> >> I am seeing the same problem I sent reverts for on 5.10.147 on Linux
> >> 6.1-rc1 on my laptop with AMD Ryzen 7 PRO 5850U with Radeon Graphics.
> >>
> >> commit e3163bc8ffdfdb405e10530b140135b2ee487f89
> >> Author: Alex Deucher <[email protected]>
> >> Date: Fri Sep 9 11:53:27 2022 -0400
> >>
> >> drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for
> >> vega
> >>
> >> I see that the following has been reverted in Linux 6.1-rc1
> >>
> >> commit 66f99628eb24409cb8feb5061f78283c8b65f820
> >> Author: Hamza Mahfooz <[email protected]>
> >> Date: Tue Sep 6 15:01:49 2022 -0400
> >>
> >> drm/amdgpu: use dirty framebuffer helper
> >>
> >> However I still see the following filling dmesg and system is unusable.
> >> For now I switched back to Linux 6.0 as this is my primary system.
> >>
> >> [drm] Fence fallback timer expired on ring sdma0 [drm] Fence fallback
> >> timer expired on ring gfx [drm] Fence fallback timer expired on ring
> >> sdma0 [drm] Fence fallback timer expired on ring gfx [drm] Fence
> >> fallback timer expired on ring sdma0 [drm] Fence fallback timer
> >> expired on ring sdma0 [drm] Fence fallback timer expired on ring
> >> sdma0 [drm] Fence fallback timer expired on ring gfx
> >>
> >> Please let me know if I should send revert for this for the mainline as well.
> >>
> >
> > Can you file a bug report
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> ab.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues&amp;data=05%7C01%7CAlexander.Deucher%40amd.com%7C61b
> 64b1be7294b27eb2308dab214dbe2%7C3dd8961fe4884e608e11a82d994e183d
> %7C0%7C0%7C638018099904584274%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> 000%7C%7C%7C&amp;sdata=ZYA0bWZAGsxB91Bqcg1YAI704LhpISQX63bE67
> UVO%2Bs%3D&amp;reserved=0) and attach your dmesg output? I'd like to
> try and repro the issue if I can and provide some patches to test. I'd like to
> avoid reverting the patch as that will break the driver for users using vega
> dGPUs.
>
> Makes sense. I will file the bug and aattach dmesg. Since this is my primary
> system, there will be some delay in getting this info. to you and testing any
> patches you provide for testing.
>

Actually I think I see what's wrong. Can you try the attached patch?

Alex


Attachments:
0001-drm-amdgpu-fix-sdma-doorbell-init-ordering-on-APUs.patch (3.25 kB)
0001-drm-amdgpu-fix-sdma-doorbell-init-ordering-on-APUs.patch

2022-10-20 01:35:50

by Shuah Khan

[permalink] [raw]
Subject: Re: Linux 6.1-rc1 drm/amdgpu regression

On 10/19/22 15:24, Deucher, Alexander wrote:
> [Public]
>
>> -----Original Message-----
>> From: Shuah Khan <[email protected]>
>> Sent: Wednesday, October 19, 2022 5:00 PM
>> To: Deucher, Alexander <[email protected]>
>> Cc: Linus Torvalds <[email protected]>; linux-
>> [email protected]; Shuah Khan <[email protected]>
>> Subject: Re: Linux 6.1-rc1 drm/amdgpu regression
>>
>> On 10/19/22 14:27, Deucher, Alexander wrote:
>>> [AMD Official Use Only - General]
>>>
>>>> -----Original Message-----
>>>> From: Shuah Khan <[email protected]>
>>>> Sent: Wednesday, October 19, 2022 4:00 PM
>>>> To: Deucher, Alexander <[email protected]>
>>>> Cc: Linus Torvalds <[email protected]>; Shuah Khan
>>>> <[email protected]>; [email protected]
>>>> Subject: Linux 6.1-rc1 drm/amdgpu regression
>>>>
>>>> Hi Alex,
>>>>
>>>> I am seeing the same problem I sent reverts for on 5.10.147 on Linux
>>>> 6.1-rc1 on my laptop with AMD Ryzen 7 PRO 5850U with Radeon Graphics.
>>>>
>>>> commit e3163bc8ffdfdb405e10530b140135b2ee487f89
>>>> Author: Alex Deucher <[email protected]>
>>>> Date: Fri Sep 9 11:53:27 2022 -0400
>>>>
>>>> drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for
>>>> vega
>>>>
>>>> I see that the following has been reverted in Linux 6.1-rc1
>>>>
>>>> commit 66f99628eb24409cb8feb5061f78283c8b65f820
>>>> Author: Hamza Mahfooz <[email protected]>
>>>> Date: Tue Sep 6 15:01:49 2022 -0400
>>>>
>>>> drm/amdgpu: use dirty framebuffer helper
>>>>
>>>> However I still see the following filling dmesg and system is unusable.
>>>> For now I switched back to Linux 6.0 as this is my primary system.
>>>>
>>>> [drm] Fence fallback timer expired on ring sdma0 [drm] Fence fallback
>>>> timer expired on ring gfx [drm] Fence fallback timer expired on ring
>>>> sdma0 [drm] Fence fallback timer expired on ring gfx [drm] Fence
>>>> fallback timer expired on ring sdma0 [drm] Fence fallback timer
>>>> expired on ring sdma0 [drm] Fence fallback timer expired on ring
>>>> sdma0 [drm] Fence fallback timer expired on ring gfx
>>>>
>>>> Please let me know if I should send revert for this for the mainline as well.
>>>>
>>>
>>> Can you file a bug report
>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
>> ab.freedesktop.org%2Fdrm%2Famd%2F-
>> %2Fissues&amp;data=05%7C01%7CAlexander.Deucher%40amd.com%7C61b
>> 64b1be7294b27eb2308dab214dbe2%7C3dd8961fe4884e608e11a82d994e183d
>> %7C0%7C0%7C638018099904584274%7CUnknown%7CTWFpbGZsb3d8eyJWIj
>> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
>> 000%7C%7C%7C&amp;sdata=ZYA0bWZAGsxB91Bqcg1YAI704LhpISQX63bE67
>> UVO%2Bs%3D&amp;reserved=0) and attach your dmesg output? I'd like to
>> try and repro the issue if I can and provide some patches to test. I'd like to
>> avoid reverting the patch as that will break the driver for users using vega
>> dGPUs.
>>
>> Makes sense. I will file the bug and aattach dmesg. Since this is my primary
>> system, there will be some delay in getting this info. to you and testing any
>> patches you provide for testing.
>>
>
> Actually I think I see what's wrong. Can you try the attached patch?
>

This patch worked. Clean boot without any warns and timer expiry messages
from drm/amdgpu.

thanks,
-- Shuah