2024-05-30 11:49:49

by David Wang

[permalink] [raw]
Subject: [Regression] 6.10-rc1: Fail to resurrect from suspend.

Hi,

My system fails to resurrect after `systemctl suspend` with 6.10-rc1,
when pressing power button, the machine "sounds" starting(fans roaring),
but my keyboard/mouse/monitor is not powered, and I have nothing to
do but powering cycle the system.

I run a bisect session, and narrows it down to following commit:

commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
Author: Dimitri Sivanich <[email protected]>
Date: Wed Apr 24 15:16:29 2024 +0800

iommu/vt-d: Allocate DMAR fault interrupts locally

The Intel IOMMU code currently tries to allocate all DMAR fault interrupt
vectors on the boot cpu. On large systems with high DMAR counts this
results in vector exhaustion, and most of the vectors are not initially
allocated socket local.

Instead, have a cpu on each node do the vector allocation for the DMARs on
that node. The boot cpu still does the allocation for its node during its
boot sequence.

Signed-off-by: Dimitri Sivanich <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

And I have confirmed that reverting this commit can fix my problem.

Following is my bisect logs:
$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# status: waiting for bad commit, 1 good commit known
# bad: [1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0] Linux 6.10-rc1
git bisect bad 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
# good: [db5d28c0bfe566908719bec8e25443aabecbb802] Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel
git bisect good db5d28c0bfe566908719bec8e25443aabecbb802
# good: [db5d28c0bfe566908719bec8e25443aabecbb802] Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel
git bisect good db5d28c0bfe566908719bec8e25443aabecbb802
# bad: [a90f1cd105c6c5c246f07ca371d873d35b78c7d9] Merge tag 'turbostat-for-Linux-6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
git bisect bad a90f1cd105c6c5c246f07ca371d873d35b78c7d9
# good: [8b35a3bb33b57bc2cb2694a50e49e0ea01b9ff6f] Merge tag 'pmdomain-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
git bisect good 8b35a3bb33b57bc2cb2694a50e49e0ea01b9ff6f
# bad: [619b92b9c8fe5369503ae948ad4e0a9c195c2c4a] Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect bad 619b92b9c8fe5369503ae948ad4e0a9c195c2c4a
# good: [91b6163be404e36baea39fc978e4739fd0448ebd] Merge tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
git bisect good 91b6163be404e36baea39fc978e4739fd0448ebd
# bad: [0cc6f45cecb46cefe89c17ec816dc8cd58a2229a] Merge tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
git bisect bad 0cc6f45cecb46cefe89c17ec816dc8cd58a2229a
# good: [89721e3038d181bacbd6be54354b513fdf1b4f10] Merge tag 'net-accept-more-20240515' of git://git.kernel.dk/linux
git bisect good 89721e3038d181bacbd6be54354b513fdf1b4f10
# good: [89721e3038d181bacbd6be54354b513fdf1b4f10] Merge tag 'net-accept-more-20240515' of git://git.kernel.dk/linux
git bisect good 89721e3038d181bacbd6be54354b513fdf1b4f10
# good: [de111f6b4f6a3010020825d22a068f416bc29c95] iommu/amd: Enable Guest Translation after reading IOMMU feature register
git bisect good de111f6b4f6a3010020825d22a068f416bc29c95
# good: [da55da5a42d4247d7a48b843fa5fcd9a4a10f4fe] iommu/arm-smmu-v3: Make the kunit into a module
git bisect good da55da5a42d4247d7a48b843fa5fcd9a4a10f4fe
# bad: [ba00196ca41c4f6d0b0d3c4a6748a133577abe05] iommu/vt-d: Decouple igfx_off from graphic identity mapping
git bisect bad ba00196ca41c4f6d0b0d3c4a6748a133577abe05
# bad: [446a68c58d2e5b8140d474f1a74082aebeee9bb0] iommu/vt-d: Add trace events for cache tag interface
git bisect bad 446a68c58d2e5b8140d474f1a74082aebeee9bb0
# bad: [cc9e49d35b4de47d6b656ac144cb22b11dc65c2e] iommu/vt-d: Remove debugfs use of private data field
git bisect bad cc9e49d35b4de47d6b656ac144cb22b11dc65c2e
# good: [9e7ee0f045395dc8aa55fbdc164c062484f4c88d] iommu/vt-d: Use try_cmpxchg64{,_local}() in iommu.c
git bisect good 9e7ee0f045395dc8aa55fbdc164c062484f4c88d
# bad: [d74169ceb0d2e32438946a2f1f9fc8c803304bd6] iommu/vt-d: Allocate DMAR fault interrupts locally
git bisect bad d74169ceb0d2e32438946a2f1f9fc8c803304bd6
# first bad commit: [d74169ceb0d2e32438946a2f1f9fc8c803304bd6] iommu/vt-d: Allocate DMAR fault interrupts locally


FYI
David



2024-06-13 11:05:25

by Pavel Machek

[permalink] [raw]
Subject: Re: [Regression] 6.10-rc1: Fail to resurrect from suspend.

Hi!

> My system fails to resurrect after `systemctl suspend` with 6.10-rc1,
> when pressing power button, the machine "sounds" starting(fans roaring),
> but my keyboard/mouse/monitor is not powered, and I have nothing to
> do but powering cycle the system.
>
> I run a bisect session, and narrows it down to following commit:
>
> commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
> Author: Dimitri Sivanich <[email protected]>
> Date: Wed Apr 24 15:16:29 2024 +0800
>
> iommu/vt-d: Allocate DMAR fault interrupts locally
>
> The Intel IOMMU code currently tries to allocate all DMAR fault interrupt
> vectors on the boot cpu. On large systems with high DMAR counts this
> results in vector exhaustion, and most of the vectors are not initially
> allocated socket local.
>
> Instead, have a cpu on each node do the vector allocation for the DMARs on
> that node. The boot cpu still does the allocation for its node during its
> boot sequence.
>
> Signed-off-by: Dimitri Sivanich <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Joerg Roedel <[email protected]>
>
> And I have confirmed that reverting this commit can fix my problem.

Bisected regression. Should we simply revert the patch?

Pavel
--
People of Russia, stop Putin before his war on Ukraine escalates.


Attachments:
(No filename) (1.51 kB)
signature.asc (201.00 B)
Download all attachments

2024-06-13 11:45:40

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [Regression] 6.10-rc1: Fail to resurrect from suspend.

On 13.06.24 13:04, Pavel Machek wrote:
>> My system fails to resurrect after `systemctl suspend` with 6.10-rc1,
>> when pressing power button, the machine "sounds" starting(fans roaring),
>> but my keyboard/mouse/monitor is not powered, and I have nothing to
>> do but powering cycle the system.
>>
>> I run a bisect session, and narrows it down to following commit:
>>
>> commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
>> Author: Dimitri Sivanich <[email protected]>
>> Date: Wed Apr 24 15:16:29 2024 +0800
>>
>> iommu/vt-d: Allocate DMAR fault interrupts locally
>>
> [...]
>> And I have confirmed that reverting this commit can fix my problem.
>
> Bisected regression. Should we simply revert the patch?

No need to afaics, as this afaics is fixed by "iommu/amd: Fix panic
accessing amd_iommu_enable_faulting" which Joerg committed earlier today.

https://lore.kernel.org/all/ZljHE%[email protected]/
https://lore.kernel.org/all/[email protected]/

Ciao, Thorsten