Subject: [REGRESSION] Bug 218538 - 3cc2ffe5c16d from 6.6 breaks S3 resume on SATA SSD OPAL

Hi, Thorsten here, the Linux kernel's regression tracker.

Danien, I noticed a regression report in bugzilla.kernel.org that seems
to be caused by a commit of yours. As many (most?) kernel developers
don't keep an eye on it, I decided to forward it by mail.

Note, you have to use bugzilla to reach the reporter, as I sadly[1] can
not CCed them in mails like this.

Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=218538 :

> Problem: since linux kernel 6.1.64 (which correspond to Debian
> linux-image-6.1.0-14-amd64 through 15, 16, 17 and 18) the system is
> unable to fully wake up from suspend. Most of the time it wakes up to a
> black screen and CAPS LOCK led doesn't change when pressing the CAPS
> LOCK button. Sometimes the monitor turns on and I can login in a tty but
> no command ever works. Not even `reboot` `shutdown` etc. Regardless if
> the monitor turns on, I can shutdown with Alt + SysRq + B.
The user later confirmed the problem still occurs with a recent mainline
and bisected it to 3cc2ffe5c16dc6 ("scsi: sd: Differentiate system and
runtime start/stop management") [v6.6-rc4].

See the ticket for more details.


[TLDR for the rest of this mail: I'm adding this report to the list of
tracked Linux kernel regressions; the text you find below is based on a
few templates paragraphs you might have encountered already in similar
form.]

BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: 3cc2ffe5c16dc6
#regzbot title: scsi: sd: S3 resume on SATA SSD OPAL broke
#regzbot from: desgua
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218538
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (e.g. the buzgzilla ticket and maybe this mail as well, if
this thread sees some discussion). See page linked in footer for details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"


2024-03-13 07:17:27

by Damien Le Moal

[permalink] [raw]
Subject: Re: [REGRESSION] Bug 218538 - 3cc2ffe5c16d from 6.6 breaks S3 resume on SATA SSD OPAL

On 3/11/24 22:25, Thorsten Leemhuis wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
>
> Danien, I noticed a regression report in bugzilla.kernel.org that seems
> to be caused by a commit of yours. As many (most?) kernel developers
> don't keep an eye on it, I decided to forward it by mail.
>
> Note, you have to use bugzilla to reach the reporter, as I sadly[1] can
> not CCed them in mails like this.
>
> Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=218538 :
>
>> Problem: since linux kernel 6.1.64 (which correspond to Debian
>> linux-image-6.1.0-14-amd64 through 15, 16, 17 and 18) the system is
>> unable to fully wake up from suspend. Most of the time it wakes up to a
>> black screen and CAPS LOCK led doesn't change when pressing the CAPS
>> LOCK button. Sometimes the monitor turns on and I can login in a tty but
>> no command ever works. Not even `reboot` `shutdown` etc. Regardless if
>> the monitor turns on, I can shutdown with Alt + SysRq + B.
> The user later confirmed the problem still occurs with a recent mainline
> and bisected it to 3cc2ffe5c16dc6 ("scsi: sd: Differentiate system and
> runtime start/stop management") [v6.6-rc4].

+ linux-scsi and Martin

Thorsten,

Thank you for bringing this to my attention. Checking the code, I think I
understand what is going on here: commit
3cc2ffe5c16dc65dfac354bc5b5bc98d3b397567 changed sd_resume() to do nothing and
delegate the disk resume to libata, as it should, because we cannot issue any
command, even START STOP UNIT, unless the ata port and device is first fully
resumed. However, the change also causes
opal_unlock_from_suspend(sdkp->opal_dev) to *NOT* be called, thus leaving the
drive locked after resume, so unusable.

Fixing this is not trivial because as mentioned above, we must first wait for
the ata port and device to be resumed before attempting to access the drive. So
I will need some brainstorming to come up with a fix. Give me a couple of days
please (I do have SED OPAL drive so I should be able to reproduce this issue).

--
Damien Le Moal
Western Digital Research