2023-11-28 13:10:15

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: Kernels v6.5 and v6.6 break resume from standby (s3) on some Intel systems if VT-d is enabled

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> Note:
>
> I'm just a Linux user, I don't work in IT or even write code, so, I'm probably using terms to describe the issue that are not the ones someone who knows code and what the system does under the hood would use.
>
> Affected system:
>
> Thinkpad, Intel Kaby Lake (i7-7600U) chipset / cpu and onboard gpu (Intel HD 620), no separate graphics card, current bios firmware; running Void Linux, xfce / lightdm
>
> Symptom / problem:
>
> Since the upgrade to kernel v6.5.5 (from v6.3.13) my system doesn't wake up from standby, i.e. resume from s3 fails 100% of the time.
> When pressing a key or the power button nothing happens. The LED that indicates different states of the system, keeps indicating standby mode.
> The only way to use the system again is hard reset by pressing the power button for a few seconds.
>
> So, there is no crashing on resume or incomplete resume or only sometimes failing to resume or failing to go into standby in the first place.
>
> Granted, this issue was present with kernels before v6.5, but only occasionally and it would not re-appear for many many boot cycles. So, I never had any lead as to why it would happen.
>
> I installed kernel v6.4.16 to test for the bug - it's not in there.
>
> For further testing I also installed kernel v6.5.2, as this was the first kernel of the 6.5 series available on void linux, (and because the kernel logs mention VT-d for kernel v6.5.5 and v6.5.3, see below). Result: The bug is already in v6.5.2, too.
>
> There's only one thing I noticed from comparing logs between kernels v6.5/6.6 vs v6.1/6.3/6.4. In the moment the system goes into standby, if running one of the latter three kernel versions the system would print the following messages:
>
> [elogind-daemon] Entering sleep state 'suspend'...
> [kernel] PM: suspend entry (deep)
>
>
> But with kernels v6.5/6.6, the kernel message is missing, only the elogind-daemon message shows up in the logs. As if the kernel didn't get the memo and thus didn't prepare and didn't listen for the wake-up call to resume.
>
>
> To see, if this is a bug that might be tight to a certain chipset / cpu generation, I tested kernel v6.5 on my old Thinkpad (Intel Sandy Bridge chipset / cpu, and also onboard graphics only). Its BIOS also has VT-d enabled. Interestingly, on that system, resume from standby with kernel v6.5 is no problem, even though its system is set up the same as the current Thinkpad.
>
> So, this bug seems to be limited to certain set of chipset / cpu. Which seems feasible, as I couldn't find a bug report on this - not too many seem to be affected.
>
>
>
> There's an older bug report on similar symptoms, but the cure doesn't work on my system:
>
> "intel_iommu=on breaks resume from suspend on several Thinkpad models"
> https://bugzilla.kernel.org/show_bug.cgi?id=197029
>
>
> Although it sounds just like what my system is experiencing - apart from the fact that term suspend being sometimes also used to describe hibernation and it is not specified which one is meant in the bug report.
>
> So, I was hopeful on the one hand that the (workaround) fix (adding intel_iommu=off to the kernel parameters) would work on my system, too - on the other hand, this bug report was for kernel v4.13, so it's probably not necessarily relevant to similar symptoms with kernel v6.5 and v6.6, respectively.
>
> Anyway, adding intel_iommu=off to the kernel parameters didn't change anything on my system. I made, of course, sure once the system was running, that intel_iommu=off was in indeed used as one of the kernel parameters.
>
>
> With this information in mind I did a regular internet search and found some information that in case intel_iommu=off in the kernel parameters doesn't help, disabling VT-d in BIOS might.
> And in my case it does indeed help avoiding the bug - for both kernel versions, v6.5 and v6.6.
>
> Reading some other bug reports and some changelogs, I noticed that iommu and vt-s are connected, to I posted this bug report in drivers/iommu. If it is misplaced here, please feel free to move it to the correct category.
>
>
> I attached a file with the output of some commands I found being used in several other bug reports on here, just in case they might be needed / helpful.
>
>
> Thank you very much for your help in advance!

See Bugzilla for the full thread.

Anyway, I'm adding this regression to regzbot:

#regzbot introduced: v6.3..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=218191
#regzbot title: resume from standby fails on Thinkpad with Kaby Lake CPU

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218191

--
An old man doll... just what I always wanted! - Clara


2023-12-03 12:31:59

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: Kernels v6.5 and v6.6 break resume from standby (s3) on some Intel systems if VT-d is enabled

On Tue, Nov 28, 2023 at 08:09:24PM +0700, Bagas Sanjaya wrote:
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
> > Note:
> >
> > I'm just a Linux user, I don't work in IT or even write code, so, I'm probably using terms to describe the issue that are not the ones someone who knows code and what the system does under the hood would use.
> >
> > Affected system:
> >
> > Thinkpad, Intel Kaby Lake (i7-7600U) chipset / cpu and onboard gpu (Intel HD 620), no separate graphics card, current bios firmware; running Void Linux, xfce / lightdm
> >
> > Symptom / problem:
> >
> > Since the upgrade to kernel v6.5.5 (from v6.3.13) my system doesn't wake up from standby, i.e. resume from s3 fails 100% of the time.
> > When pressing a key or the power button nothing happens. The LED that indicates different states of the system, keeps indicating standby mode.
> > The only way to use the system again is hard reset by pressing the power button for a few seconds.
> >
> > So, there is no crashing on resume or incomplete resume or only sometimes failing to resume or failing to go into standby in the first place.
> >
> > Granted, this issue was present with kernels before v6.5, but only occasionally and it would not re-appear for many many boot cycles. So, I never had any lead as to why it would happen.
> >
> > I installed kernel v6.4.16 to test for the bug - it's not in there.
> >
> > For further testing I also installed kernel v6.5.2, as this was the first kernel of the 6.5 series available on void linux, (and because the kernel logs mention VT-d for kernel v6.5.5 and v6.5.3, see below). Result: The bug is already in v6.5.2, too.
> >
> > There's only one thing I noticed from comparing logs between kernels v6.5/6.6 vs v6.1/6.3/6.4. In the moment the system goes into standby, if running one of the latter three kernel versions the system would print the following messages:
> >
> > [elogind-daemon] Entering sleep state 'suspend'...
> > [kernel] PM: suspend entry (deep)
> >
> >
> > But with kernels v6.5/6.6, the kernel message is missing, only the elogind-daemon message shows up in the logs. As if the kernel didn't get the memo and thus didn't prepare and didn't listen for the wake-up call to resume.
> >
> >
> > To see, if this is a bug that might be tight to a certain chipset / cpu generation, I tested kernel v6.5 on my old Thinkpad (Intel Sandy Bridge chipset / cpu, and also onboard graphics only). Its BIOS also has VT-d enabled. Interestingly, on that system, resume from standby with kernel v6.5 is no problem, even though its system is set up the same as the current Thinkpad.
> >
> > So, this bug seems to be limited to certain set of chipset / cpu. Which seems feasible, as I couldn't find a bug report on this - not too many seem to be affected.
> >
> >
> >
> > There's an older bug report on similar symptoms, but the cure doesn't work on my system:
> >
> > "intel_iommu=on breaks resume from suspend on several Thinkpad models"
> > https://bugzilla.kernel.org/show_bug.cgi?id=197029
> >
> >
> > Although it sounds just like what my system is experiencing - apart from the fact that term suspend being sometimes also used to describe hibernation and it is not specified which one is meant in the bug report.
> >
> > So, I was hopeful on the one hand that the (workaround) fix (adding intel_iommu=off to the kernel parameters) would work on my system, too - on the other hand, this bug report was for kernel v4.13, so it's probably not necessarily relevant to similar symptoms with kernel v6.5 and v6.6, respectively.
> >
> > Anyway, adding intel_iommu=off to the kernel parameters didn't change anything on my system. I made, of course, sure once the system was running, that intel_iommu=off was in indeed used as one of the kernel parameters.
> >
> >
> > With this information in mind I did a regular internet search and found some information that in case intel_iommu=off in the kernel parameters doesn't help, disabling VT-d in BIOS might.
> > And in my case it does indeed help avoiding the bug - for both kernel versions, v6.5 and v6.6.
> >
> > Reading some other bug reports and some changelogs, I noticed that iommu and vt-s are connected, to I posted this bug report in drivers/iommu. If it is misplaced here, please feel free to move it to the correct category.
> >
> >
> > I attached a file with the output of some commands I found being used in several other bug reports on here, just in case they might be needed / helpful.
> >
> >
> > Thank you very much for your help in advance!
>
> See Bugzilla for the full thread.
>
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot introduced: v6.3..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=218191
> #regzbot title: resume from standby fails on Thinkpad with Kaby Lake CPU
>

The reporter had done bisection (see Bugzilla for details), so telling
regzbot:

#regzbot introduced: 0c7ffa32dbd6b0

Thanks.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (4.99 kB)
signature.asc (235.00 B)
Download all attachments

2023-12-05 14:58:24

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Fwd: Kernels v6.5 and v6.6 break resume from standby (s3) on some Intel systems if VT-d is enabled

[CCing Mario, as he might be interested]

On 03.12.23 13:31, Bagas Sanjaya wrote:
> On Tue, Nov 28, 2023 at 08:09:24PM +0700, Bagas Sanjaya wrote:
>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>>
>>> Affected system:
>>>
>>> Thinkpad, Intel Kaby Lake (i7-7600U) chipset / cpu and onboard
>>> gpu (Intel HD 620), no separate graphics card, current bios
>>> firmware; running Void Linux, xfce / lightdm
>>>
>>> Symptom / problem:
>>>
>>> Since the upgrade to kernel v6.5.5 (from v6.3.13) my system
>>> doesn't wake up from standby, i.e. resume from s3 fails 100% of
>>> the time.
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=218191
>
> The reporter had done bisection (see Bugzilla for details),

Turns out the kernel used by the reporter is not vanilla, as zfs is
involved -- and due to that the reporter is not even able to check if
mainline is still affected. I'll thus remove this from the tracking.
Sorry for the noise.

In case anyone nevertheless cares: The bisection result from the report
was 0c7ffa32dbd6b0 ("x86/smpboot/64: Implement
arch_cpuhp_init_parallel_bringup() and enable it") [v6.5-rc1].
This time on a Intel machine. Mario's "Fixes for s3 with parallel
bootup" patch-series[1] (the one were abandoned because things turned
out to be a BIOS bug affecting some AMD systems) apparently resolved the
problem for the reporter.

[1]
https://lore.kernel.org/all/[email protected]/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot inconclusive: kernel used by the reporter is not vanilla