2022-02-14 14:33:38

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

Hi Alex, hi all

In Debian we got a regression report from Dominique Dumont, CC'ed in
https://bugs.debian.org/1005005 that afer an update to 5.15.15 based
kernel, his machine noe longer suspends correctly, after screen going
black as usual it comes back. The Debian bug above contians a trace.

Dominique confirmed that this issue persisted after updating to 5.16.7
furthermore he bisected the issue and found

3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
commit 3c196f05666610912645c7c5d9107706003f67c3
Author: Alex Deucher <[email protected]>
Date: Fri Nov 12 11:25:30 2021 -0500

drm/amdgpu: always reset the asic in suspend (v2)

[ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]

If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.

v2: handle s0ix

Acked-by: Luben Tuikov <[email protected]>
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>

drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

to be the first bad commit, see https://bugs.debian.org/1005005#34 .

Does this ring any bell? Any idea on the problem?

Regards,
Salvatore


Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?


[TLDR: I'm adding the regression report below to regzbot, the Linux
kernel regression tracking bot; all text you find below is compiled from
a few templates paragraphs you might have encountered already already
from similar mails.]

Hi, this is your Linux kernel regression tracker speaking.

CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 3c196f056666
#regzbot title amdgfx: suspend stopped working
#regzbot ignore-activity
#regzbot link: https://bugs.debian.org/1005005

Reminder for developers: when fixing the issue, please add a 'Link:'
tags pointing to the report (the mail quoted above) using
lore.kernel.org/r/, as explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'. This allows the bot to connect
the report with any patches posted or committed to fix the issue; this
again allows the bot to show the current status of regressions and
automatically resolve the issue when the fix hits the right tree.

I'm sending this to everyone that got the initial report, to make them
aware of the tracking. I also hope that messages like this motivate
people to directly get at least the regression mailing list and ideally
even regzbot involved when dealing with regressions, as messages like
this wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), if
they are relevant just for regzbot. With a bit of luck no such messages
will be needed anyway.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.


On 12.02.22 19:23, Salvatore Bonaccorso wrote:
> Hi Alex, hi all
>
> In Debian we got a regression report from Dominique Dumont, CC'ed in
> https://bugs.debian.org/1005005 that afer an update to 5.15.15 based
> kernel, his machine noe longer suspends correctly, after screen going
> black as usual it comes back. The Debian bug above contians a trace.
>
> Dominique confirmed that this issue persisted after updating to 5.16.7
> furthermore he bisected the issue and found
>
> 3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
> commit 3c196f05666610912645c7c5d9107706003f67c3
> Author: Alex Deucher <[email protected]>
> Date: Fri Nov 12 11:25:30 2021 -0500
>
> drm/amdgpu: always reset the asic in suspend (v2)
>
> [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
>
> If the platform suspend happens to fail and the power rail
> is not turned off, the GPU will be in an unknown state on
> resume, so reset the asic so that it will be in a known
> good state on resume even if the platform suspend failed.
>
> v2: handle s0ix
>
> Acked-by: Luben Tuikov <[email protected]>
> Acked-by: Evan Quan <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> to be the first bad commit, see https://bugs.debian.org/1005005#34 .
>
> Does this ring any bell? Any idea on the problem?
>
> Regards,
> Salvatore

--
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.

2022-02-14 22:34:54

by Alex Deucher

[permalink] [raw]
Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On Sat, Feb 12, 2022 at 1:23 PM Salvatore Bonaccorso <[email protected]> wrote:
>
> Hi Alex, hi all
>
> In Debian we got a regression report from Dominique Dumont, CC'ed in
> https://bugs.debian.org/1005005 that afer an update to 5.15.15 based
> kernel, his machine noe longer suspends correctly, after screen going
> black as usual it comes back. The Debian bug above contians a trace.
>
> Dominique confirmed that this issue persisted after updating to 5.16.7
> furthermore he bisected the issue and found
>
> 3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
> commit 3c196f05666610912645c7c5d9107706003f67c3
> Author: Alex Deucher <[email protected]>
> Date: Fri Nov 12 11:25:30 2021 -0500
>
> drm/amdgpu: always reset the asic in suspend (v2)
>
> [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
>
> If the platform suspend happens to fail and the power rail
> is not turned off, the GPU will be in an unknown state on
> resume, so reset the asic so that it will be in a known
> good state on resume even if the platform suspend failed.
>
> v2: handle s0ix
>
> Acked-by: Luben Tuikov <[email protected]>
> Acked-by: Evan Quan <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> to be the first bad commit, see https://bugs.debian.org/1005005#34 .
>
> Does this ring any bell? Any idea on the problem?

Does the system actually suspend? Putting the GPU into reset on
suspend shouldn't cause any problems since the power rail will
presumably be cut by the platform. Is this system S0i3 or regular S3?
Does this patch help by any chance?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e55a3aea418269266d84f426b3bd70794d3389c8

Alex


>
> Regards,
> Salvatore

2022-02-15 03:14:32

by Evan Quan

[permalink] [raw]
Subject: RE: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

[AMD Official Use Only]



> -----Original Message-----
> From: Salvatore Bonaccorso <[email protected]> On Behalf
> Of Salvatore Bonaccorso
> Sent: Sunday, February 13, 2022 2:24 AM
> To: Deucher, Alexander <[email protected]>
> Cc: Dominique Dumont <[email protected]>; [email protected];
> Tuikov, Luben <[email protected]>; Quan, Evan
> <[email protected]>; Sasha Levin <[email protected]>; Koenig, Christian
> <[email protected]>; Pan, Xinhui <[email protected]>; David
> Airlie <[email protected]>; Daniel Vetter <[email protected]>; amd-
> [email protected]; [email protected]; linux-
> [email protected]
> Subject: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic
> in suspend (v2)") on suspend?
>
> Hi Alex, hi all
>
> In Debian we got a regression report from Dominique Dumont, CC'ed in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005&amp;data=04%7C01%7Cevan.quan%40amd.com%7
> C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d994e1
> 83d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000&amp;sdata=6xECB3MmvNYuOn41ZOEDPyWUjklY%2Bfxumz7lf8fijwA
> %3D&amp;reserved=0 that afer an update to 5.15.15 based kernel, his
> machine noe longer suspends correctly, after screen going black as usual it
> comes back. The Debian bug above contians a trace.
>
> Dominique confirmed that this issue persisted after updating to 5.16.7
> furthermore he bisected the issue and found
>
> 3c196f05666610912645c7c5d9107706003f67c3 is the first bad commit
> commit 3c196f05666610912645c7c5d9107706003f67c3
> Author: Alex Deucher <[email protected]>
> Date: Fri Nov 12 11:25:30 2021 -0500
>
> drm/amdgpu: always reset the asic in suspend (v2)
>
> [ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]
>
> If the platform suspend happens to fail and the power rail
> is not turned off, the GPU will be in an unknown state on
> resume, so reset the asic so that it will be in a known
> good state on resume even if the platform suspend failed.
>
> v2: handle s0ix
>
> Acked-by: Luben Tuikov <[email protected]>
> Acked-by: Evan Quan <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> to be the first bad commit, see
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs
> .debian.org%2F1005005%2334&amp;data=04%7C01%7Cevan.quan%40amd.c
> om%7C735917b6e3f44fc8fda808d9ee54cbc0%7C3dd8961fe4884e608e11a82d
> 994e183d%7C0%7C0%7C637802870862664095%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000&amp;sdata=CV%2FKmpYT8WOVJnrTiU91godaFDJMpjih%2FAV
> NAcw5qaI%3D&amp;reserved=0 .
I checked the back trace posted there(below). It seems the error occurred during amdgpu_device_suspend().
That means Alex's patch should not be related(as it affected only those logic after amdgpu_device_suspend()).
So we might got a wrong regression point here.
[ 257.842851] ? vi_common_set_clockgating_state+0x229/0x2f0 [amdgpu]
[ 257.843356] amdgpu_device_ip_suspend_phase1+0x5e/0xc0 [amdgpu]
[ 257.843771] amdgpu_device_suspend+0x62/0xc0 [amdgpu]
[ 257.844184] amdgpu_pmops_suspend+0x36/0x70 [amdgpu]
[ 257.844631] pci_pm_suspend+0x71/0x160
[ 257.844643] ? pci_pm_freeze+0xb0/0xb0

BR
Evan
>
> Does this ring any bell? Any idea on the problem?
>
> Regards,
> Salvatore

2022-02-21 07:33:25

by Eric Valette

[permalink] [raw]
Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On 20/02/2022 16:48, Dominique Dumont wrote:
> On Monday, 14 February 2022 22:52:27 CET Alex Deucher wrote:
>> Does the system actually suspend?
>
> Not really. The screens looks like it's going to suspend, but it does come
> back after 10s or so. The light mounted in the middle of the power button does
> not switch off.


As I have a very similar problem and also commented on the original
debian bug report
(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005005), I will add
some information here on another amd only laptop (renoir AMD Ryzen 7
4800H with Radeon Graphics + Radeon RX 5500/5500M / Pro 5500M).

For me the suspend works once, but after the first resume (I do know
know if it is in the suspend path or the resume path I see a RIP in the
dmesg (see aditional info in debian bug)) and later suspend do not
work: It only go to the kde login screen.

I was unable due to network connectivity to do a full bisect but tested
with the patch I had on my laptop:

5.10.101 works, 5.10 from debian works
5.11 works
5.12 works
5.13 suspend works but when resuming the PC is dead I have to reboot
5.14 seems to work but looking at dmesg it is full of RIP messages at
various places.
5.15.24 is a described 5.15 from debian is behaving identically
5.16 from debian is behaving identically.

>> Is this system S0i3 or regular S3?

For me it is real S3.

The proposed patch is intended for INTEl + intel gpu + amdgpu but I have
dual amd GPU.

--eric


2022-02-21 09:13:23

by Dominique Dumont

[permalink] [raw]
Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On Monday, 14 February 2022 22:52:27 CET Alex Deucher wrote:
> Does the system actually suspend?

Not really. The screens looks like it's going to suspend, but it does come
back after 10s or so. The light mounted in the middle of the power button does
not switch off.

> Is this system S0i3 or regular S3?

I'm not sure how to check that. After a bit of reading on the Internet [1], I
hope that the following information answers that question. Please get back to
me if that's not the case.

Looks like my system supports both Soi3 and S3

$ cat /sys/power/state
freeze mem disk

I get the same result running these 2 commands as root:
# echo freeze > /sys/power/state
# echo mem > /sys/power/state

> Does this patch help by any chance?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=e55a3aea418269266d84f426b3bd70794d3389c8

yes, with this patch:
- the suspend issue is solved
- kernel logs no longer show messages like "failed to send message" or
"*ERROR* suspend of IP block <powerplay> failed" while suspending

All the best

[1] https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/
hibernate-issues


2022-02-22 05:29:56

by Alex Deucher

[permalink] [raw]
Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On Mon, Feb 21, 2022 at 3:29 AM Eric Valette <[email protected]> wrote:
>
> On 20/02/2022 16:48, Dominique Dumont wrote:
> > On Monday, 14 February 2022 22:52:27 CET Alex Deucher wrote:
> >> Does the system actually suspend?
> >
> > Not really. The screens looks like it's going to suspend, but it does come
> > back after 10s or so. The light mounted in the middle of the power button does
> > not switch off.
>
>
> As I have a very similar problem and also commented on the original
> debian bug report
> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005005), I will add
> some information here on another amd only laptop (renoir AMD Ryzen 7
> 4800H with Radeon Graphics + Radeon RX 5500/5500M / Pro 5500M).
>
> For me the suspend works once, but after the first resume (I do know
> know if it is in the suspend path or the resume path I see a RIP in the
> dmesg (see aditional info in debian bug)) and later suspend do not
> work: It only go to the kde login screen.
>
> I was unable due to network connectivity to do a full bisect but tested
> with the patch I had on my laptop:
>
> 5.10.101 works, 5.10 from debian works
> 5.11 works
> 5.12 works
> 5.13 suspend works but when resuming the PC is dead I have to reboot
> 5.14 seems to work but looking at dmesg it is full of RIP messages at
> various places.
> 5.15.24 is a described 5.15 from debian is behaving identically
> 5.16 from debian is behaving identically.
>
> >> Is this system S0i3 or regular S3?
>
> For me it is real S3.
>
> The proposed patch is intended for INTEl + intel gpu + amdgpu but I have
> dual amd GPU.

It doesn't really matter what the platform is, it could still
potentially help on your system, it depends on the bios implementation
for your platform and how it handles suspend. You can try the patch,
but I don't think you are hitting the same issue. I bisect would be
helpful in your case.

Alex

2022-02-24 15:19:04

by Eric Valette

[permalink] [raw]
Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On 2/21/22 15:16, Alex Deucher wrote:

>>>> Is this system S0i3 or regular S3?
>>
>> For me it is real S3.
>>
>> The proposed patch is intended for INTEl + intel gpu + amdgpu but I have
>> dual amd GPU.
> It doesn't really matter what the platform is, it could still
> potentially help on your system, it depends on the bios implementation
> for your platform and how it handles suspend. You can try the patch,
> but I don't think you are hitting the same issue.  I bisect would be
> helpful in your case.

Trying to add the pach on top of 5.15.24, I got a already applied message and indeed the patch is already there. So this particular patch it does not fix my problem.

Saw new modif in 5.15.25. Will try and check if I can find time to bissect.

-- eric

2022-03-21 21:30:50

by Dominique Dumont

[permalink] [raw]
Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

Hi

On Monday, 21 March 2022 09:57:59 CET Thorsten Leemhuis wrote:
> Dominique/Salvatore/Eric, what's the status of this regression?
> According to the debian bug tracker the problem is solved with 5.16 and
> 5.17, but was 5.15 ever fixed?

I don't think so.

On kernel side, the commit fixing this issue is
e55a3aea418269266d84f426b3bd70794d3389c8 .

According to the logs of [1] , this commit landed in v5.17-rc3

HTH

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git


2022-03-21 21:44:42

by Diederik de Haas

[permalink] [raw]
Subject: Re: Bug#1005005: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On maandag 21 maart 2022 19:49:56 CET Dominique Dumont wrote:
> On Monday, 21 March 2022 09:57:59 CET Thorsten Leemhuis wrote:
> > Dominique/Salvatore/Eric, what's the status of this regression?
> > According to the debian bug tracker the problem is solved with 5.16 and
> > 5.17, but was 5.15 ever fixed?
>
> I don't think so.
>
> On kernel side, the commit fixing this issue is
> e55a3aea418269266d84f426b3bd70794d3389c8 .
>
> According to the logs of [1] , this commit landed in v5.17-rc3

It was included in 5.15.22, but the newest 5.15 kernel uploaded to Debian was
5.15.15, so their is no fixed 5.15 in Debian.
It was also included in 5.16.8 and the earlier version in Debian which had
that commit was 5.16.10 (uploaded 2022-02-18 to Unstable). Current version in
Unstable is 5.16.14. Testing/Bookworm now had 5.16.12.
In Experimental, on 2022-02-12, 5.17-rc3 was uploaded.

HTH,
Diederik


Attachments:
signature.asc (235.00 B)
This is a digitally signed message part.

2022-03-21 22:01:34

by Eric Valette

[permalink] [raw]
Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

My problem has never been fixed. The proposed patch has been applied to 5.15. I do not remerber which version 28 maybe.

I still have à RIP in pm_suspend. Did not test the Last two 15 versions.

I can leave with 5.10 est using own compiled kernels.

Thanks for asking.

21 mars 2022 09:58:01 Thorsten Leemhuis <[email protected]>:

> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> Dominique/Salvatore/Eric, what's the status of this regression?
> According to the debian bug tracker the problem is solved with 5.16 and
> 5.17, but was 5.15 ever fixed?
>
> Ciao, Thorsten
>
> On 21.02.22 15:16, Alex Deucher wrote:
>> On Mon, Feb 21, 2022 at 3:29 AM Eric Valette <[email protected]> wrote:
>>>
>>> On 20/02/2022 16:48, Dominique Dumont wrote:
>>>> On Monday, 14 February 2022 22:52:27 CET Alex Deucher wrote:
>>>>> Does the system actually suspend?
>>>>
>>>> Not really. The screens looks like it's going to suspend, but it does come
>>>> back after 10s or so. The light mounted in the middle of the power button does
>>>> not switch off.
>>>
>>>
>>> As I have a very similar problem and also commented on the original
>>> debian bug report
>>> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005005), I will add
>>> some information here on another amd only laptop (renoir AMD Ryzen 7
>>> 4800H with Radeon Graphics + Radeon RX 5500/5500M / Pro 5500M).
>>>
>>> For me the suspend works once, but after the first resume (I do know
>>> know if it is in the suspend path or the resume path I see a RIP in the
>>> dmesg (see aditional info in debian bug))  and later suspend do not
>>> work: It only go to the kde login screen.
>>>
>>> I was unable due to network connectivity to do a full bisect but tested
>>> with the patch I had on my laptop:
>>>
>>> 5.10.101 works, 5.10 from debian works
>>> 5.11 works
>>> 5.12 works
>>> 5.13 suspend works but when resuming the PC is dead I have to reboot
>>> 5.14 seems to work but looking at dmesg it is full of RIP messages at
>>> various places.
>>> 5.15.24 is a described 5.15 from debian is behaving identically
>>> 5.16 from debian is behaving identically.
>>>
>>>>> Is this system S0i3 or regular S3?
>>>
>>> For me it is real S3.
>>>
>>> The proposed patch is intended for INTEl + intel gpu + amdgpu but I have
>>> dual amd GPU.
>>
>> It doesn't really matter what the platform is, it could still
>> potentially help on your system, it depends on the bios implementation
>> for your platform and how it handles suspend. You can try the patch,
>> but I don't think you are hitting the same issue.  I bisect would be
>> helpful in your case.
>>
>> Alex

Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Dominique/Salvatore/Eric, what's the status of this regression?
According to the debian bug tracker the problem is solved with 5.16 and
5.17, but was 5.15 ever fixed?

Ciao, Thorsten

On 21.02.22 15:16, Alex Deucher wrote:
> On Mon, Feb 21, 2022 at 3:29 AM Eric Valette <[email protected]> wrote:
>>
>> On 20/02/2022 16:48, Dominique Dumont wrote:
>>> On Monday, 14 February 2022 22:52:27 CET Alex Deucher wrote:
>>>> Does the system actually suspend?
>>>
>>> Not really. The screens looks like it's going to suspend, but it does come
>>> back after 10s or so. The light mounted in the middle of the power button does
>>> not switch off.
>>
>>
>> As I have a very similar problem and also commented on the original
>> debian bug report
>> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005005), I will add
>> some information here on another amd only laptop (renoir AMD Ryzen 7
>> 4800H with Radeon Graphics + Radeon RX 5500/5500M / Pro 5500M).
>>
>> For me the suspend works once, but after the first resume (I do know
>> know if it is in the suspend path or the resume path I see a RIP in the
>> dmesg (see aditional info in debian bug)) and later suspend do not
>> work: It only go to the kde login screen.
>>
>> I was unable due to network connectivity to do a full bisect but tested
>> with the patch I had on my laptop:
>>
>> 5.10.101 works, 5.10 from debian works
>> 5.11 works
>> 5.12 works
>> 5.13 suspend works but when resuming the PC is dead I have to reboot
>> 5.14 seems to work but looking at dmesg it is full of RIP messages at
>> various places.
>> 5.15.24 is a described 5.15 from debian is behaving identically
>> 5.16 from debian is behaving identically.
>>
>>>> Is this system S0i3 or regular S3?
>>
>> For me it is real S3.
>>
>> The proposed patch is intended for INTEl + intel gpu + amdgpu but I have
>> dual amd GPU.
>
> It doesn't really matter what the platform is, it could still
> potentially help on your system, it depends on the bios implementation
> for your platform and how it handles suspend. You can try the patch,
> but I don't think you are hitting the same issue. I bisect would be
> helpful in your case.
>
> Alex

Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On 21.03.22 13:07, Éric Valette wrote:
> My problem has never been fixed.
>
> The proposed patch has been applied to 5.15. I do not remerber which version 28 maybe.
>
> I still have à RIP in pm_suspend. Did not test the Last two 15 versions.
>
> I can leave with 5.10 est using own compiled kernels.
>
> Thanks for asking.

This thread/the debian bug report
(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005005 ) is getting
long which makes things hard to grasp. But to me it looks a lot like the
problem you are facing is different from the problem that others ran
into and bisected -- but I might be totally wrong there. Have you ever
tried reverting 3c196f056666 to seem if it helps (sorry if that's
mentioned in the bug report somewhere, as I said, it became long)? I
guess a bisection from your side really would help a lot; but before you
go down that route you might want to give 5.17 and the latest 5.15.y
kernel a try.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

Subject: Re: Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

On 21.03.22 19:49, Dominique Dumont wrote:
> On Monday, 21 March 2022 09:57:59 CET Thorsten Leemhuis wrote:
>> Dominique/Salvatore/Eric, what's the status of this regression?
>> According to the debian bug tracker the problem is solved with 5.16 and
>> 5.17, but was 5.15 ever fixed?
>
> I don't think so.
>
> On kernel side, the commit fixing this issue is
> e55a3aea418269266d84f426b3bd70794d3389c8 .
>
> According to the logs of [1] , this commit landed in v5.17-rc3
>
> HTH
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

And from there it among others got backported to 5.15.22:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.15.y&id=8a15ac1786c92dce6ecbeb4e4c237f5f80c2c703

https://lwn.net/Articles/884107/

Another indicator that Eric's problem is something else.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.