2023-12-14 01:50:16

by Jonathan Woithe

[permalink] [raw]
Subject: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

Hi

Following an update from 5.15.72 to 5.15.139 on one of my machines, the
console froze part way through the boot process. The machine still managed
to boot: it could be reached via the network and a keyboard-initiated
shutdown would do the right thing. The problem was that the screen remained
static the whole time: the X login did not appear. Only a reboot would
restore the display's functionality.

Comparing boot logs between these two kernels showed that 5.15.139 reported
the following messages not seen with 5.15.72:

thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled
WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]

thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled
WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]

radeon 0000:4b:00.0: Fatal error during GPU init
radeon: probe of 0000:4b:00.0 failed with error -12

The fatal error during GPU initialisation would be the reason behind the
frozen screen. I don't know if the thunderbolt warnings are significant.

A git bisect resulted in the following report:

d9ce077f8b1f731407e6b612b03bba464fd18d9b is the first bad commit
commit d9ce077f8b1f731407e6b612b03bba464fd18d9b
Author: Igor Mammedov <[email protected]>
Date: Mon Apr 24 21:15:57 2023 +0200

PCI: acpiphp: Reassign resources on bridge if necessary

[ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ]

It's taken me a while to work through the bisect process due to limited
access to the machine concerned. I see that in the last few days there have
been other reports associated with this commit. The symptoms on my machine
are different to the other reporters. In particular, I note that I'm
running the Linux kernel on bare metal.

For what it's worth, I also experienced the same problem when I tested 6.6.4
last week (the most recent kernel at the time of testing).

The output of lspci is given at the end of this post[1]. The CPU is an
"Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz" which is not overclocked. Please
let me know if you'd like more information about the affected machine. I
can also perform additional tests if required, although for various reasons
these can only be done on Thursdays at present.

The kernel configuration file can easily be supplied if that would be
useful.

Regards
jonathan

[1] lspci output

00:00.0 Host bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 (rev 02)
00:01.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
00:01.1 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02)
00:05.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management (rev 02)
00:05.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug (rev 02)
00:05.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors (rev 02)
00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation C610/X99 series chipset HD Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
00:1c.3 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #4 (rev d5)
00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
02:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
03:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
03:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
03:02.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
03:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
04:00.0 System peripheral: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015]
49:00.0 USB controller: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge]
4b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series]
4b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series]
4d:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
ff:0b.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
ff:0b.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
ff:0b.2 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
ff:0c.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
ff:0c.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
ff:0c.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
ff:0c.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
ff:0c.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
ff:0c.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
ff:0f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
ff:0f.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
ff:0f.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
ff:0f.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
ff:0f.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
ff:10.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
ff:10.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
ff:10.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
ff:10.6 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
ff:10.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
ff:12.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
ff:12.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
ff:13.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
ff:13.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
ff:13.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
ff:13.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
ff:13.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
ff:13.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
ff:13.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast (rev 02)
ff:13.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
ff:14.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control (rev 02)
ff:14.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control (rev 02)
ff:14.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers (rev 02)
ff:14.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers (rev 02)
ff:14.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
ff:14.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
ff:14.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
ff:14.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
ff:15.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control (rev 02)
ff:15.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control (rev 02)
ff:15.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers (rev 02)
ff:15.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers (rev 02)
ff:16.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers (rev 02)
ff:16.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast (rev 02)
ff:16.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
ff:17.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control (rev 02)
ff:17.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
ff:17.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
ff:17.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
ff:17.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
ff:1e.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
ff:1e.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
ff:1e.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
ff:1e.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
ff:1e.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
ff:1f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
ff:1f.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)


2023-12-14 13:32:30

by Igor Mammedov

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Thu, 14 Dec 2023 11:58:20 +1030
Jonathan Woithe <[email protected]> wrote:

> Hi
>
> Following an update from 5.15.72 to 5.15.139 on one of my machines, the

looks like you are running downstream kernel, can you file bug report
with distro that you use (with a link posed here as well).

For now offending patches are being reverted, so downstream bug will help
with tracking it and reverting it there.

> console froze part way through the boot process. The machine still managed
> to boot: it could be reached via the network and a keyboard-initiated
> shutdown would do the right thing. The problem was that the screen remained
> static the whole time: the X login did not appear. Only a reboot would
> restore the display's functionality.
>
> Comparing boot logs between these two kernels showed that 5.15.139 reported
> the following messages not seen with 5.15.72:
>
> thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled
> WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]
>
> thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled
> WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]
>
> radeon 0000:4b:00.0: Fatal error during GPU init
> radeon: probe of 0000:4b:00.0 failed with error -12
>
> The fatal error during GPU initialisation would be the reason behind the
> frozen screen. I don't know if the thunderbolt warnings are significant.
>
> A git bisect resulted in the following report:
>
> d9ce077f8b1f731407e6b612b03bba464fd18d9b is the first bad commit
> commit d9ce077f8b1f731407e6b612b03bba464fd18d9b
> Author: Igor Mammedov <[email protected]>
> Date: Mon Apr 24 21:15:57 2023 +0200
>
> PCI: acpiphp: Reassign resources on bridge if necessary
>
> [ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ]
>
> It's taken me a while to work through the bisect process due to limited
> access to the machine concerned. I see that in the last few days there have
> been other reports associated with this commit. The symptoms on my machine
> are different to the other reporters. In particular, I note that I'm
> running the Linux kernel on bare metal.
>
> For what it's worth, I also experienced the same problem when I tested 6.6.4
> last week (the most recent kernel at the time of testing).
>
> The output of lspci is given at the end of this post[1]. The CPU is an
> "Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz" which is not overclocked. Please
> let me know if you'd like more information about the affected machine. I
> can also perform additional tests if required, although for various reasons
> these can only be done on Thursdays at present.
>
> The kernel configuration file can easily be supplied if that would be
> useful.

full dmesg log and used config might help down the road (preferably with current
upstream kernel), as I will be looking into fixing related issues.

Perhaps a better way for taking this issue and collecting logs,
will be opening a separate bug at https://bugzilla.kernel.org (pls CC me as well)


> Regards
> jonathan
>
> [1] lspci output
>
> 00:00.0 Host bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 (rev 02)
> 00:01.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
> 00:01.1 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
> 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02)
> 00:05.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management (rev 02)
> 00:05.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug (rev 02)
> 00:05.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors (rev 02)
> 00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
> 00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
> 00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
> 00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
> 00:1b.0 Audio device: Intel Corporation C610/X99 series chipset HD Audio Controller (rev 05)
> 00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
> 00:1c.3 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #4 (rev d5)
> 00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
> 00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
> 00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
> 00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
> 02:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> 03:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> 03:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> 03:02.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> 03:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> 04:00.0 System peripheral: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015]
> 49:00.0 USB controller: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge]
> 4b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series]
> 4b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series]
> 4d:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
> ff:0b.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> ff:0b.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> ff:0b.2 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> ff:0c.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> ff:0c.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> ff:0c.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> ff:0c.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> ff:0c.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> ff:0c.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> ff:0f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
> ff:0f.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
> ff:0f.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> ff:0f.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> ff:0f.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> ff:10.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
> ff:10.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
> ff:10.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> ff:10.6 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> ff:10.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> ff:12.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
> ff:12.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
> ff:13.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
> ff:13.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
> ff:13.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> ff:13.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> ff:13.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> ff:13.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> ff:13.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast (rev 02)
> ff:13.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
> ff:14.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control (rev 02)
> ff:14.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control (rev 02)
> ff:14.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers (rev 02)
> ff:14.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers (rev 02)
> ff:14.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> ff:14.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> ff:14.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> ff:14.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> ff:15.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control (rev 02)
> ff:15.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control (rev 02)
> ff:15.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers (rev 02)
> ff:15.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers (rev 02)
> ff:16.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers (rev 02)
> ff:16.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast (rev 02)
> ff:16.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
> ff:17.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control (rev 02)
> ff:17.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> ff:17.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> ff:17.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> ff:17.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> ff:1e.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> ff:1e.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> ff:1e.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> ff:1e.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> ff:1e.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> ff:1f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
> ff:1f.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
>

2023-12-14 22:15:07

by Jonathan Woithe

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Thu, Dec 14, 2023 at 02:32:05PM +0100, Igor Mammedov wrote:
> On Thu, 14 Dec 2023 11:58:20 +1030 Jonathan Woithe wrote:
> >
> > Following an update from 5.15.72 to 5.15.139 on one of my machines, the
>
> looks like you are running downstream kernel, can you file bug report
> with distro that you use (with a link posed here as well).

I am running Slackware64 15.0. The kernels supplied by that distribution
are unmodified kernel.org kernels.

> For now offending patches are being reverted, so downstream bug will help
> with tracking it and reverting it there.

The patches will be reverted in Slackware as a matter of course when a
kernel.org "-stable" kernel with the fix is adopted. Slackware does not
apply any patches to kernel.org kernels. Nevertheless, I will raise a post
in the forum, hopefully later today.

> > console froze part way through the boot process. The machine still managed
> > to boot: it could be reached via the network and a keyboard-initiated
> > shutdown would do the right thing. The problem was that the screen remained
> > static the whole time: the X login did not appear. Only a reboot would
> > restore the display's functionality.
> >
> > Comparing boot logs between these two kernels showed that 5.15.139 reported
> > the following messages not seen with 5.15.72:
> >
> > thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled
> > WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]
> >
> > thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled
> > WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]
> >
> > radeon 0000:4b:00.0: Fatal error during GPU init
> > radeon: probe of 0000:4b:00.0 failed with error -12
> >
> > The fatal error during GPU initialisation would be the reason behind the
> > frozen screen. I don't know if the thunderbolt warnings are significant.
> >
> > A git bisect resulted in the following report:
> >
> > d9ce077f8b1f731407e6b612b03bba464fd18d9b is the first bad commit
> > commit d9ce077f8b1f731407e6b612b03bba464fd18d9b
> > Author: Igor Mammedov <[email protected]>
> > Date: Mon Apr 24 21:15:57 2023 +0200
> >
> > PCI: acpiphp: Reassign resources on bridge if necessary
> >
> > [ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ]
> >
> > It's taken me a while to work through the bisect process due to limited
> > access to the machine concerned. I see that in the last few days there have
> > been other reports associated with this commit. The symptoms on my machine
> > are different to the other reporters. In particular, I note that I'm
> > running the Linux kernel on bare metal.
> >
> > For what it's worth, I also experienced the same problem when I tested 6.6.4
> > last week (the most recent kernel at the time of testing).
> >
> > The output of lspci is given at the end of this post[1]. The CPU is an
> > "Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz" which is not overclocked. Please
> > let me know if you'd like more information about the affected machine. I
> > can also perform additional tests if required, although for various reasons
> > these can only be done on Thursdays at present.
> >
> > The kernel configuration file can easily be supplied if that would be
> > useful.
>
> full dmesg log and used config might help down the road (preferably with current
> upstream kernel), as I will be looking into fixing related issues.
>
> Perhaps a better way for taking this issue and collecting logs, will be
> opening a separate bug at https://bugzilla.kernel.org (pls CC me as well)

Sure, will do. I'll be able to get the dmesg log from my earlier tests and
config easily enough. Testing with another kernel will have to wait until
next Thursday as that is when I'll next have physical access to the machine.

Which upstream kernel would you like me to test with: the latest "-stable",
or the most recent release?

Regards
jonathan

> > [1] lspci output
> >
> > 00:00.0 Host bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 (rev 02)
> > 00:01.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
> > 00:01.1 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
> > 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02)
> > 00:05.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management (rev 02)
> > 00:05.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug (rev 02)
> > 00:05.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors (rev 02)
> > 00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
> > 00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
> > 00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
> > 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
> > 00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
> > 00:1b.0 Audio device: Intel Corporation C610/X99 series chipset HD Audio Controller (rev 05)
> > 00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
> > 00:1c.3 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #4 (rev d5)
> > 00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
> > 00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
> > 00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
> > 00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
> > 02:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > 03:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > 03:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > 03:02.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > 03:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > 04:00.0 System peripheral: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015]
> > 49:00.0 USB controller: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge]
> > 4b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series]
> > 4b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series]
> > 4d:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
> > ff:0b.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> > ff:0b.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> > ff:0b.2 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> > ff:0c.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > ff:0c.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > ff:0c.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > ff:0c.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > ff:0c.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > ff:0c.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > ff:0f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
> > ff:0f.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
> > ff:0f.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> > ff:0f.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> > ff:0f.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> > ff:10.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
> > ff:10.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
> > ff:10.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> > ff:10.6 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> > ff:10.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> > ff:12.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
> > ff:12.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
> > ff:13.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
> > ff:13.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
> > ff:13.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > ff:13.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > ff:13.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > ff:13.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > ff:13.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast (rev 02)
> > ff:13.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
> > ff:14.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control (rev 02)
> > ff:14.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control (rev 02)
> > ff:14.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers (rev 02)
> > ff:14.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers (rev 02)
> > ff:14.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > ff:14.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > ff:14.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > ff:14.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > ff:15.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control (rev 02)
> > ff:15.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control (rev 02)
> > ff:15.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers (rev 02)
> > ff:15.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers (rev 02)
> > ff:16.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers (rev 02)
> > ff:16.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast (rev 02)
> > ff:16.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
> > ff:17.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control (rev 02)
> > ff:17.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > ff:17.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > ff:17.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > ff:17.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > ff:1e.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > ff:1e.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > ff:1e.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > ff:1e.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > ff:1e.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > ff:1f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
> > ff:1f.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)

2023-12-15 09:29:38

by Igor Mammedov

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Fri, 15 Dec 2023 08:43:27 +1030
Jonathan Woithe <[email protected]> wrote:

> On Thu, Dec 14, 2023 at 02:32:05PM +0100, Igor Mammedov wrote:
> > On Thu, 14 Dec 2023 11:58:20 +1030 Jonathan Woithe wrote:
> > >
> > > Following an update from 5.15.72 to 5.15.139 on one of my machines, the
> >
> > looks like you are running downstream kernel, can you file bug report
> > with distro that you use (with a link posed here as well).
>
> I am running Slackware64 15.0. The kernels supplied by that distribution
> are unmodified kernel.org kernels.
>
> > For now offending patches are being reverted, so downstream bug will help
> > with tracking it and reverting it there.
>
> The patches will be reverted in Slackware as a matter of course when a
> kernel.org "-stable" kernel with the fix is adopted. Slackware does not
> apply any patches to kernel.org kernels. Nevertheless, I will raise a post
> in the forum, hopefully later today.
>
> > > console froze part way through the boot process. The machine still managed
> > > to boot: it could be reached via the network and a keyboard-initiated
> > > shutdown would do the right thing. The problem was that the screen remained
> > > static the whole time: the X login did not appear. Only a reboot would
> > > restore the display's functionality.
> > >
> > > Comparing boot logs between these two kernels showed that 5.15.139 reported
> > > the following messages not seen with 5.15.72:
> > >
> > > thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled
> > > WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]
> > >
> > > thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled
> > > WARNING: CPU: 0 PID: 713 at drivers/thunderbolt/nhi.c:139 ring_interrupt_active+0x218/0x270 [thunderbolt]
> > >
> > > radeon 0000:4b:00.0: Fatal error during GPU init
> > > radeon: probe of 0000:4b:00.0 failed with error -12
> > >
> > > The fatal error during GPU initialisation would be the reason behind the
> > > frozen screen. I don't know if the thunderbolt warnings are significant.
> > >
> > > A git bisect resulted in the following report:
> > >
> > > d9ce077f8b1f731407e6b612b03bba464fd18d9b is the first bad commit
> > > commit d9ce077f8b1f731407e6b612b03bba464fd18d9b
> > > Author: Igor Mammedov <[email protected]>
> > > Date: Mon Apr 24 21:15:57 2023 +0200
> > >
> > > PCI: acpiphp: Reassign resources on bridge if necessary
> > >
> > > [ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ]
> > >
> > > It's taken me a while to work through the bisect process due to limited
> > > access to the machine concerned. I see that in the last few days there have
> > > been other reports associated with this commit. The symptoms on my machine
> > > are different to the other reporters. In particular, I note that I'm
> > > running the Linux kernel on bare metal.
> > >
> > > For what it's worth, I also experienced the same problem when I tested 6.6.4
> > > last week (the most recent kernel at the time of testing).
> > >
> > > The output of lspci is given at the end of this post[1]. The CPU is an
> > > "Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz" which is not overclocked. Please
> > > let me know if you'd like more information about the affected machine. I
> > > can also perform additional tests if required, although for various reasons
> > > these can only be done on Thursdays at present.

can you provide 'lspci -tv' output as well and machine model for the record?

> > > The kernel configuration file can easily be supplied if that would be
> > > useful.
> >
> > full dmesg log and used config might help down the road (preferably with current
> > upstream kernel), as I will be looking into fixing related issues.
> >
> > Perhaps a better way for taking this issue and collecting logs, will be
> > opening a separate bug at https://bugzilla.kernel.org (pls CC me as well)
>
> Sure, will do. I'll be able to get the dmesg log from my earlier tests and
> config easily enough. Testing with another kernel will have to wait until
> next Thursday as that is when I'll next have physical access to the machine.
>
> Which upstream kernel would you like me to test with: the latest "-stable",
> or the most recent release?

current master branch that still has offending patches would do
(or any recent one with specifying commit id)

Also add:

dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p" ignore_loglevel

to kernel command line to get more data from PCI/acpiphp enumeration process

> Regards
> jonathan
>
> > > [1] lspci output



> > >
> > > 00:00.0 Host bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 (rev 02)
> > > 00:01.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
> > > 00:01.1 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
> > > 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02)
> > > 00:05.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management (rev 02)
> > > 00:05.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug (rev 02)
> > > 00:05.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors (rev 02)
> > > 00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
> > > 00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
> > > 00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
> > > 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
> > > 00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
> > > 00:1b.0 Audio device: Intel Corporation C610/X99 series chipset HD Audio Controller (rev 05)
> > > 00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
> > > 00:1c.3 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #4 (rev d5)
> > > 00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
> > > 00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
> > > 00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
> > > 00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
> > > 02:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > > 03:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > > 03:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > > 03:02.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > > 03:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> > > 04:00.0 System peripheral: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015]
> > > 49:00.0 USB controller: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge]
> > > 4b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series]
> > > 4b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series]
> > > 4d:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
> > > ff:0b.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> > > ff:0b.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> > > ff:0b.2 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring (rev 02)
> > > ff:0c.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > > ff:0c.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > > ff:0c.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > > ff:0c.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > > ff:0c.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > > ff:0c.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers (rev 02)
> > > ff:0f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
> > > ff:0f.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent (rev 02)
> > > ff:0f.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> > > ff:0f.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> > > ff:0f.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers (rev 02)
> > > ff:10.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
> > > ff:10.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
> > > ff:10.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> > > ff:10.6 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> > > ff:10.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers (rev 02)
> > > ff:12.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
> > > ff:12.1 Performance counters: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
> > > ff:13.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
> > > ff:13.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers (rev 02)
> > > ff:13.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > > ff:13.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > > ff:13.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > > ff:13.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder (rev 02)
> > > ff:13.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast (rev 02)
> > > ff:13.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
> > > ff:14.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control (rev 02)
> > > ff:14.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control (rev 02)
> > > ff:14.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers (rev 02)
> > > ff:14.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers (rev 02)
> > > ff:14.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > > ff:14.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > > ff:14.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > > ff:14.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 (rev 02)
> > > ff:15.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control (rev 02)
> > > ff:15.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control (rev 02)
> > > ff:15.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers (rev 02)
> > > ff:15.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers (rev 02)
> > > ff:16.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers (rev 02)
> > > ff:16.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast (rev 02)
> > > ff:16.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast (rev 02)
> > > ff:17.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control (rev 02)
> > > ff:17.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > > ff:17.5 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > > ff:17.6 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > > ff:17.7 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 (rev 02)
> > > ff:1e.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > > ff:1e.1 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > > ff:1e.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > > ff:1e.3 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > > ff:1e.4 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit (rev 02)
> > > ff:1f.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
> > > ff:1f.2 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU (rev 02)
>


2023-12-15 11:28:23

by Jonathan Woithe

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Fri, Dec 15, 2023 at 08:43:29AM +1030, Jonathan Woithe wrote:
> On Thu, Dec 14, 2023 at 02:32:05PM +0100, Igor Mammedov wrote:
> > On Thu, 14 Dec 2023 11:58:20 +1030 Jonathan Woithe wrote:
> > >
> > > Following an update from 5.15.72 to 5.15.139 on one of my machines, the
> >
> > looks like you are running downstream kernel, can you file bug report
> > with distro that you use (with a link posed here as well).
>
> I am running Slackware64 15.0. The kernels supplied by that distribution
> are unmodified kernel.org kernels.
>
> > For now offending patches are being reverted, so downstream bug will help
> > with tracking it and reverting it there.
>
> The patches will be reverted in Slackware as a matter of course when a
> kernel.org "-stable" kernel with the fix is adopted. Slackware does not
> apply any patches to kernel.org kernels. Nevertheless, I will raise a post
> in the forum, hopefully later today.

This has now been done:

https://www.linuxquestions.org/questions/slackware-14/heads-up-pci-regression-introduced-in-or-around-5-15-129-commit-40613da52b13-4175731828/#post6470559

> > > The output of lspci is given at the end of this post[1]. The CPU is an
> > > "Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz" which is not overclocked. Please
> > > let me know if you'd like more information about the affected machine. I
> > > can also perform additional tests if required, although for various reasons
> > > these can only be done on Thursdays at present.
> > >
> > > The kernel configuration file can easily be supplied if that would be
> > > useful.
> >
> > full dmesg log and used config might help down the road (preferably with current
> > upstream kernel), as I will be looking into fixing related issues.
> >
> > Perhaps a better way for taking this issue and collecting logs, will be
> > opening a separate bug at https://bugzilla.kernel.org (pls CC me as well)
>
> Sure, will do. I'll be able to get the dmesg log from my earlier tests and
> config easily enough. Testing with another kernel will have to wait until
> next Thursday as that is when I'll next have physical access to the machine.

A bug has been opened at bugzilla.kernel.org as requested. The logs, kernel
configuration and the "lspci -tv" output (requested in a subsequent email)
have been added. The logs and kernel configuration are from the kernel.org
5.15.139 kernel. You have been added to the bug's CC. The bug number is
218268:

https://bugzilla.kernel.org/show_bug.cgi?id=218268

As mentioned, testing another kernel can only happen next Thursday. If
you would like other tests done let me know and I'll do them at the same
time. I have remote access to the machine, so it's possible to retrieve
information from it at any time.

Let me know if there's anything else I can do to assist.

Regards
jonathan

2023-12-15 13:37:14

by Igor Mammedov

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Fri, 15 Dec 2023 21:57:43 +1030
Jonathan Woithe <[email protected]> wrote:

> On Fri, Dec 15, 2023 at 08:43:29AM +1030, Jonathan Woithe wrote:
> > On Thu, Dec 14, 2023 at 02:32:05PM +0100, Igor Mammedov wrote:
> > > On Thu, 14 Dec 2023 11:58:20 +1030 Jonathan Woithe wrote:
> > > >
> > > > Following an update from 5.15.72 to 5.15.139 on one of my machines, the
> > >
> > > looks like you are running downstream kernel, can you file bug report
> > > with distro that you use (with a link posed here as well).
> >
> > I am running Slackware64 15.0. The kernels supplied by that distribution
> > are unmodified kernel.org kernels.
> >
> > > For now offending patches are being reverted, so downstream bug will help
> > > with tracking it and reverting it there.
> >
> > The patches will be reverted in Slackware as a matter of course when a
> > kernel.org "-stable" kernel with the fix is adopted. Slackware does not
> > apply any patches to kernel.org kernels. Nevertheless, I will raise a post
> > in the forum, hopefully later today.
>
> This has now been done:
>
> https://www.linuxquestions.org/questions/slackware-14/heads-up-pci-regression-introduced-in-or-around-5-15-129-commit-40613da52b13-4175731828/#post6470559
>
> > > > The output of lspci is given at the end of this post[1]. The CPU is an
> > > > "Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz" which is not overclocked. Please
> > > > let me know if you'd like more information about the affected machine. I
> > > > can also perform additional tests if required, although for various reasons
> > > > these can only be done on Thursdays at present.
> > > >
> > > > The kernel configuration file can easily be supplied if that would be
> > > > useful.
> > >
> > > full dmesg log and used config might help down the road (preferably with current
> > > upstream kernel), as I will be looking into fixing related issues.
> > >
> > > Perhaps a better way for taking this issue and collecting logs, will be
> > > opening a separate bug at https://bugzilla.kernel.org (pls CC me as well)
> >
> > Sure, will do. I'll be able to get the dmesg log from my earlier tests and
> > config easily enough. Testing with another kernel will have to wait until
> > next Thursday as that is when I'll next have physical access to the machine.
>
> A bug has been opened at bugzilla.kernel.org as requested. The logs, kernel
> configuration and the "lspci -tv" output (requested in a subsequent email)
> have been added. The logs and kernel configuration are from the kernel.org
> 5.15.139 kernel. You have been added to the bug's CC. The bug number is
> 218268:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=218268
>
> As mentioned, testing another kernel can only happen next Thursday. If
> you would like other tests done let me know and I'll do them at the same
> time. I have remote access to the machine, so it's possible to retrieve
> information from it at any time.

lets wait till you can get logs with dyndbg='...' (I've asked for earlier)
and one more test with "pci=realloc" on kernel CLI to see if that helps.

> Let me know if there's anything else I can do to assist.

It looks like pci_assign_unassigned_bridge_resources() messed up BIOS configured
resources. And then didn't manage to reconfigure bridges correctly, which led to
unassigned BARs => thunderbolt/VGA issues.

Something in ACPI tables must be triggering acpiphp hotplug path during boot.
Can you dump DSDT + SSDT tables and attach them to BZ.
PS:
to dump tables you can use command from acpica-tools (not sure how it's called in Slackware)
acpidump -b
which will dump all tables in binary format (so attach those or 'iasl -d' de-compiled ones)

>
> Regards
> jonathan
>


2023-12-15 23:36:03

by Jonathan Woithe

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Fri, Dec 15, 2023 at 02:36:38PM +0100, Igor Mammedov wrote:
> On Fri, 15 Dec 2023 21:57:43 +1030
> Jonathan Woithe <[email protected]> wrote:
>
> > On Fri, Dec 15, 2023 at 08:43:29AM +1030, Jonathan Woithe wrote:
> > > On Thu, Dec 14, 2023 at 02:32:05PM +0100, Igor Mammedov wrote:
> > > > On Thu, 14 Dec 2023 11:58:20 +1030 Jonathan Woithe wrote:
> > > > >
> > > > > Following an update from 5.15.72 to 5.15.139 on one of my machines, the
> > > >
> > > > looks like you are running downstream kernel, can you file bug report
> > > > with distro that you use (with a link posed here as well).
> > >
> > > I am running Slackware64 15.0. The kernels supplied by that distribution
> > > are unmodified kernel.org kernels.
> > >
> > > > For now offending patches are being reverted, so downstream bug will help
> > > > with tracking it and reverting it there.
> > >
> > > The patches will be reverted in Slackware as a matter of course when a
> > > kernel.org "-stable" kernel with the fix is adopted. Slackware does not
> > > apply any patches to kernel.org kernels. Nevertheless, I will raise a post
> > > in the forum, hopefully later today.
> >
> > This has now been done:
> >
> > https://www.linuxquestions.org/questions/slackware-14/heads-up-pci-regression-introduced-in-or-around-5-15-129-commit-40613da52b13-4175731828/#post6470559
> >
> > > > > The output of lspci is given at the end of this post[1]. The CPU is an
> > > > > "Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz" which is not overclocked. Please
> > > > > let me know if you'd like more information about the affected machine. I
> > > > > can also perform additional tests if required, although for various reasons
> > > > > these can only be done on Thursdays at present.
> > > > >
> > > > > The kernel configuration file can easily be supplied if that would be
> > > > > useful.
> > > >
> > > > full dmesg log and used config might help down the road (preferably with current
> > > > upstream kernel), as I will be looking into fixing related issues.
> > > >
> > > > Perhaps a better way for taking this issue and collecting logs, will be
> > > > opening a separate bug at https://bugzilla.kernel.org (pls CC me as well)
> > >
> > > Sure, will do. I'll be able to get the dmesg log from my earlier tests and
> > > config easily enough. Testing with another kernel will have to wait until
> > > next Thursday as that is when I'll next have physical access to the machine.
> >
> > A bug has been opened at bugzilla.kernel.org as requested. The logs, kernel
> > configuration and the "lspci -tv" output (requested in a subsequent email)
> > have been added. The logs and kernel configuration are from the kernel.org
> > 5.15.139 kernel. You have been added to the bug's CC. The bug number is
> > 218268:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=218268
> >
> > As mentioned, testing another kernel can only happen next Thursday. If
> > you would like other tests done let me know and I'll do them at the same
> > time. I have remote access to the machine, so it's possible to retrieve
> > information from it at any time.
>
> lets wait till you can get logs with dyndbg='...' (I've asked for earlier)
> and one more test with "pci=realloc" on kernel CLI to see if that helps.

Okay.

> > Let me know if there's anything else I can do to assist.
>
> It looks like pci_assign_unassigned_bridge_resources() messed up BIOS configured
> resources. And then didn't manage to reconfigure bridges correctly, which led to
> unassigned BARs => thunderbolt/VGA issues.
>
> Something in ACPI tables must be triggering acpiphp hotplug path during boot.
> Can you dump DSDT + SSDT tables and attach them to BZ.
> PS:
> to dump tables you can use command from acpica-tools (not sure how it's called in Slackware)
> acpidump -b
> which will dump all tables in binary format (so attach those or 'iasl -d' de-compiled ones)

Slackware doesn't provide acpidump AFAIK but it was easy enough to compile
from tools/power/acpi/tools/acpidump within the 5.15.72 tree I'd used
previously to build 5.15.72 (the kernel that's currently running). However,
running

acpidump -b

resulted in a segmentation fault. The root cause is line 339 in vsnprintf()
in utprint.c. With a workaround in place a functional acpidump was
obtained. I can provide further information from my analysis if it's
appropriate and you're interested.

For completeness I then tried the acpica source from Intel (version
20230628). The acpidump from this worked fine, and the files produced were
identical to those from my fixed kernel-tree acpidump. A tarball of these
is now attached to the Bugzilla report. Curiously enough, the corresponding
vsnprintf() line in the Intel acpica source distribution matches that in the
5.15.72 source. Perhaps the compiler flags in the Intel package happen to
prevent the segfault condition occurring at runtime for me.

Regards
jonathan

2023-12-21 06:16:42

by Jonathan Woithe

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Sat, Dec 16, 2023 at 10:05:22AM +1030, Jonathan Woithe wrote:
> On Fri, Dec 15, 2023 at 02:36:38PM +0100, Igor Mammedov wrote:
> > On Fri, 15 Dec 2023 21:57:43 +1030
> > Jonathan Woithe <[email protected]> wrote:
> > > As mentioned, testing another kernel can only happen next Thursday. If
> > > you would like other tests done let me know and I'll do them at the same
> > > time. I have remote access to the machine, so it's possible to retrieve
> > > information from it at any time.
> >
> > lets wait till you can get logs with dyndbg='...' (I've asked for earlier)
> > and one more test with "pci=realloc" on kernel CLI to see if that helps.
>
> Okay.

I added the "dyndbg=" option to the 5.15.139 kernel command line and booted.
The resulting dmesg output has been attached to bugzilla 218268.

I also tested 5.15.139 with the "pci=realloc" kernel parameter. This was
sufficient to allow the system to boot without a GPU initialisation failure.
The dmesg output from this boot has also been attached to bugzilla 218268.

I used kernel.org's 5.15.139 for these tests because I already had it
compiled and was a little short of time. If you'd like me to repeat the
tests with a different kernel let me know which one and I'll see what I can
do. The Christmas break may delay this somewhat.

Regards
jonathan

2024-01-03 09:04:20

by Igor Mammedov

[permalink] [raw]
Subject: Re: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")

On Thu, 21 Dec 2023 16:46:02 +1030
Jonathan Woithe <[email protected]> wrote:

> On Sat, Dec 16, 2023 at 10:05:22AM +1030, Jonathan Woithe wrote:
> > On Fri, Dec 15, 2023 at 02:36:38PM +0100, Igor Mammedov wrote:
> > > On Fri, 15 Dec 2023 21:57:43 +1030
> > > Jonathan Woithe <[email protected]> wrote:
> > > > As mentioned, testing another kernel can only happen next Thursday. If
> > > > you would like other tests done let me know and I'll do them at the same
> > > > time. I have remote access to the machine, so it's possible to retrieve
> > > > information from it at any time.
> > >
> > > lets wait till you can get logs with dyndbg='...' (I've asked for earlier)
> > > and one more test with "pci=realloc" on kernel CLI to see if that helps.
> >
> > Okay.
>
> I added the "dyndbg=" option to the 5.15.139 kernel command line and booted.
> The resulting dmesg output has been attached to bugzilla 218268.
>
> I also tested 5.15.139 with the "pci=realloc" kernel parameter. This was
> sufficient to allow the system to boot without a GPU initialisation failure.
> The dmesg output from this boot has also been attached to bugzilla 218268.
>
> I used kernel.org's 5.15.139 for these tests because I already had it
> compiled and was a little short of time. If you'd like me to repeat the
> tests with a different kernel let me know which one and I'll see what I can
> do. The Christmas break may delay this somewhat.

Thanks for collecting tables and debug logs,
I'll get back to you if more debug info or testing would be needed.

> Regards
> jonathan
>