2024-05-04 16:16:00

by Nam Cao

[permalink] [raw]
Subject: [PATCH v2 0/2] abort hot-plug if pci_hp_add_bridge() fails

Hi,

When I hot-plug a bridge, but there is no bus number available for its
downstreambus, the kernel crashes.

This can be reproduced with something like:

qemu-system-x86_64 -machine pc-q35-2.10 \
-kernel ../build-pci/arch/x86/boot/bzImage \
-drive "file=img,format=raw" \
-m 2048 -smp 2 -enable-kvm \
-append "console=ttyS0 root=/dev/sda" \
-nographic \
-device pcie-root-port,bus=pcie.0,id=rp1,slot=1,bus-reserve=0

Note the "bus-reserve=0": no bus number is reserved for hot-plugging a
bridge.

After booting is completed, a PCI bridge can be hot-added with the QEMU
command:
device_add pcie-pci-bridge,id=br1,bus=rp1

After this command, the kernel crashes (crash log below).

The reason is that, hot-plugging a bridge is done with pci_hp_add_bridge()
and this can fail. However, its returned value is not checked, and the
kernel proceeds despite the bridge was not added correctly. This results
in the crash.

Best regards,
Nam

[ 77.763860] pcieport 0000:00:03.0: pciehp: Slot(1): Button press: will power on in 5 sec
[ 77.765343] pcieport 0000:00:03.0: pciehp: Slot(1): Card present
[ 77.766385] pcieport 0000:00:03.0: pciehp: Slot(1): Link Up
[ 78.881224] pci 0000:01:00.0: [1b36:000e] type 01 class 0x060400 PCIe to PCI/PCI-X bridge
[ 78.883650] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000000ff 64bit]
[ 78.884849] pci 0000:01:00.0: PCI bridge to [bus 00]
[ 78.886433] pci 0000:01:00.0: bridge window [io 0x0000-0x0fff]
[ 78.887541] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff]
[ 78.889479] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref]
[ 78.892464] pci 0000:01:00.0: No bus number available for hot-added bridge
[ 78.893717] pci 0000:01:00.0: BAR 0 [mem 0xfe800000-0xfe8000ff 64bit]: assigned
[ 78.895703] pcieport 0000:00:03.0: PCI bridge to [bus 01]
[ 78.896708] pcieport 0000:00:03.0: bridge window [io 0x1000-0x1fff]
[ 78.898878] pcieport 0000:00:03.0: bridge window [mem 0xfe800000-0xfe9fffff]
[ 78.900829] pcieport 0000:00:03.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref]
[ 78.905378] shpchp 0000:01:00.0: HPC vendor_id 1b36 device_id e ss_vid 0 ss_did 0
[ 78.906729] shpchp 0000:01:00.0: enabling device (0000 -> 0002)
[ 78.910290] BUG: kernel NULL pointer dereference, address: 00000000000000da
[ 78.911539] #PF: supervisor write access in kernel mode
[ 78.912484] #PF: error_code(0x0002) - not-present page
[ 78.913407] PGD 0 P4D 0
[ 78.913871] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 78.914652] CPU: 0 PID: 45 Comm: irq/24-pciehp Not tainted 6.8.6 #31
[ 78.915774] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 78.917395] RIP: 0010:shpc_init+0x3fb/0x9d0
[ 78.918162] Code: 8b 48 08 40 80 ff 02 0f 84 15 04 00 00 f7 c2 00 00 00 1f 0f 84 44 02 00 00 b8 04 00 00 00 b9 04 00 0f
[ 78.921407] RSP: 0018:ffffc9000018fad8 EFLAGS: 00010246
[ 78.922330] RAX: 0000000000000000 RBX: ffff88800459ab00 RCX: 0000000000000000
[ 78.923591] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffffffff83015701
[ 78.924845] RBP: ffffc9000018fb20 R08: ffff888003658280 R09: 0000000000000000
[ 78.926093] R10: 0000000000000000 R11: ffff888006888780 R12: ffff8880042ff000
[ 78.927358] R13: 0000000000000000 R14: 000000007f000d0f R15: 000000000000001f
[ 78.928622] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 78.930040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.931056] CR2: 00000000000000da CR3: 000000000471a000 CR4: 00000000000006f0
[ 78.932321] Call Trace:
[ 78.932770] <TASK>
[ 78.933175] ? show_regs+0x64/0x70
[ 78.933793] ? __die+0x23/0x70
[ 78.934351] ? page_fault_oops+0x17b/0x460
[ 78.935087] ? search_module_extables+0x18/0x60
[ 78.935907] ? shpc_init+0x3fb/0x9d0
[ 78.936548] ? kernelmode_fixup_or_oops+0x9d/0x120
[ 78.937420] ? __bad_area_nosemaphore+0x16b/0x220
[ 78.938272] ? bad_area_nosemaphore+0x11/0x20
[ 78.939068] ? do_user_addr_fault+0x28c/0x610
[ 78.939858] ? exc_page_fault+0x6e/0x160
[ 78.940566] ? asm_exc_page_fault+0x2b/0x30
[ 78.941332] ? shpc_init+0x3fb/0x9d0
[ 78.941976] ? shpc_init+0x569/0x9d0
[ 78.942618] shpc_probe+0x92/0x390
[ 78.943232] local_pci_probe+0x46/0xa0
[ 78.943922] pci_device_probe+0xb0/0x190
[ 78.944491] really_probe+0xc2/0x2d0
[ 78.944996] ? __pfx___device_attach_driver+0x10/0x10
[ 78.945693] __driver_probe_device+0x73/0x120
[ 78.946300] driver_probe_device+0x1f/0xf0
[ 78.946869] __device_attach_driver+0x8d/0x120
[ 78.947489] bus_for_each_drv+0x96/0xf0
[ 78.948031] __device_attach+0xae/0x1a0
[ 78.948571] device_attach+0xf/0x20
[ 78.949062] pci_bus_add_device+0x58/0x90
[ 78.949628] pci_bus_add_devices+0x30/0x70
[ 78.950201] pciehp_configure_device+0xa8/0x150
[ 78.950840] pciehp_handle_presence_or_link_change+0x161/0x4a0
[ 78.951655] pciehp_ist+0x20f/0x240
[ 78.952144] ? __pfx_irq_thread_fn+0x10/0x10
[ 78.952744] irq_thread_fn+0x23/0x60
[ 78.953245] irq_thread+0xfa/0x1c0
[ 78.953726] ? __pfx_irq_thread_dtor+0x10/0x10
[ 78.954346] ? __pfx_irq_thread+0x10/0x10
[ 78.955175] kthread+0xe0/0x110
[ 78.955631] ? __pfx_kthread+0x10/0x10
[ 78.956160] ret_from_fork+0x3c/0x60
[ 78.956661] ? __pfx_kthread+0x10/0x10
[ 78.957207] ret_from_fork_asm+0x1b/0x30
[ 78.957754] </TASK>
[ 78.958070] Modules linked in:
[ 78.958501] CR2: 00000000000000da
[ 78.958970] ---[ end trace 0000000000000000 ]---
[ 78.959615] RIP: 0010:shpc_init+0x3fb/0x9d0
[ 78.960201] Code: 8b 48 08 40 80 ff 02 0f 84 15 04 00 00 f7 c2 00 00 00 1f 0f 84 44 02 00 00 b8 04 00 00 00 b9 04 00 0f
[ 78.962745] RSP: 0018:ffffc9000018fad8 EFLAGS: 00010246
[ 78.963462] RAX: 0000000000000000 RBX: ffff88800459ab00 RCX: 0000000000000000
[ 78.964441] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffffffff83015701
[ 78.965469] RBP: ffffc9000018fb20 R08: ffff888003658280 R09: 0000000000000000
[ 78.966537] R10: 0000000000000000 R11: ffff888006888780 R12: ffff8880042ff000
[ 78.967531] R13: 0000000000000000 R14: 000000007f000d0f R15: 000000000000001f
[ 78.968539] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 78.969662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.970472] CR2: 00000000000000da CR3: 000000000471a000 CR4: 00000000000006f0
[ 78.971449] note: irq/24-pciehp[45] exited with irqs disabled
[ 78.972281] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 78.973232] #PF: supervisor instruction fetch in kernel mode
[ 78.974012] #PF: error_code(0x0010) - not-present page
[ 78.974717] PGD 0 P4D 0
[ 78.975075] Oops: 0010 [#2] PREEMPT SMP NOPTI
[ 78.975686] CPU: 0 PID: 45 Comm: irq/24-pciehp Tainted: G D 6.8.6 #31
[ 78.976751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 78.978011] RIP: 0010:0x0
[ 78.978383] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 78.979274] RSP: 0018:ffffc9000018fe98 EFLAGS: 00010286
[ 78.979996] RAX: 0000000000000000 RBX: ffff888003d2c740 RCX: 00000000000001c0
[ 78.980973] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffc9000018fea0
[ 78.981949] RBP: ffffc9000018feb8 R08: 0000000000009ffb R09: 00000000ffffdfff
[ 78.982924] R10: 0000000000000001 R11: ffffffff82a58aa0 R12: ffff888003d2c740
[ 78.983901] R13: ffff888003d2cf54 R14: ffff888003e78001 R15: 0000000000000000
[ 78.984884] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 78.985992] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 78.986786] CR2: ffffffffffffffd6 CR3: 000000000471a000 CR4: 00000000000006f0
[ 78.987793] Call Trace:
[ 78.988143] <TASK>
[ 78.988443] ? show_regs+0x64/0x70
[ 78.988924] ? __die+0x23/0x70
[ 78.989355] ? page_fault_oops+0x17b/0x460
[ 78.989925] ? do_user_addr_fault+0x2d1/0x610
[ 78.990539] ? _prb_read_valid+0x2e6/0x370
[ 78.991132] ? exc_page_fault+0x6e/0x160
[ 78.991688] ? asm_exc_page_fault+0x2b/0x30
[ 78.992318] task_work_run+0x60/0x90
[ 78.992798] do_exit+0x355/0xb00
[ 78.993234] make_task_dead+0x7e/0x160
[ 78.993708] rewind_stack_and_make_dead+0x17/0x20
[ 78.994303] </TASK>
[ 78.994589] Modules linked in:
[ 78.994984] CR2: 0000000000000000
[ 78.995408] ---[ end trace 0000000000000000 ]---
[ 78.995984] RIP: 0010:shpc_init+0x3fb/0x9d0
[ 78.996503] Code: 8b 48 08 40 80 ff 02 0f 84 15 04 00 00 f7 c2 00 00 00 1f 0f 84 44 02 00 00 b8 04 00 00 00 b9 04 00 0f
[ 78.998828] RSP: 0018:ffffc9000018fad8 EFLAGS: 00010246
[ 78.999500] RAX: 0000000000000000 RBX: ffff88800459ab00 RCX: 0000000000000000
[ 79.000406] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffffffff83015701
[ 79.001282] RBP: ffffc9000018fb20 R08: ffff888003658280 R09: 0000000000000000
[ 79.002159] R10: 0000000000000000 R11: ffff888006888780 R12: ffff8880042ff000
[ 79.003036] R13: 0000000000000000 R14: 000000007f000d0f R15: 000000000000001f
[ 79.003926] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 79.004921] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 79.005637] CR2: ffffffffffffffd6 CR3: 000000000471a000 CR4: 00000000000006f0
[ 79.006523] note: irq/24-pciehp[45] exited with irqs disabled
[ 79.007261] Fixing recursive fault but reboot is needed!
[ 79.007942] BUG: scheduling while atomic: irq/24-pciehp/45/0x00000000
[ 79.008740] Modules linked in:
[ 79.009151] CPU: 0 PID: 45 Comm: irq/24-pciehp Tainted: G D 6.8.6 #31
[ 79.010117] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 79.011255] Call Trace:
[ 79.011573] <TASK>
[ 79.011845] dump_stack_lvl+0x5f/0x80
[ 79.012310] dump_stack+0x14/0x20
[ 79.012730] __schedule_bug+0x51/0x70
[ 79.013195] __schedule+0x79c/0x890
[ 79.013634] ? vprintk+0x31/0x40
[ 79.014044] ? _printk+0x5f/0x80
[ 79.014456] do_task_dead+0x43/0x50
[ 79.014897] make_task_dead+0x142/0x160
[ 79.015378] rewind_stack_and_make_dead+0x17/0x20
[ 79.015971] </TASK>

Nam Cao (2):
PCI: shpchp: Abort hot-plug if pci_hp_add_bridge() fails
PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

drivers/pci/hotplug/pciehp_pci.c | 6 ++++--
drivers/pci/hotplug/shpchp_pci.c | 6 ++++--
2 files changed, 8 insertions(+), 4 deletions(-)

--
2.39.2



2024-05-04 16:15:59

by Nam Cao

[permalink] [raw]
Subject: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

If a bridge is hot-added without any bus number available for its
downstream bus, pci_hp_add_bridge() will fail. However, the driver
proceeds regardless, and the kernel crashes.

This crash can be reproduced with the QEMU command:
qemu-system-x86_64 -machine pc-q35-2.10 \
-kernel bzImage \
-drive "file=img,format=raw" \
-m 2048 -smp 2 -enable-kvm \
-append "console=ttyS0 root=/dev/sda" \
-nographic \
-device pcie-root-port,bus=pcie.0,id=rp1,slot=1,bus-reserve=0

then hot-plug a bridge at runtime with the QEMU command:
device_add pcie-pci-bridge,id=br1,bus=rp1

and the kernel crashes:

pcieport 0000:00:03.0: pciehp: Slot(1): Button press: will power on in 5 sec
pcieport 0000:00:03.0: pciehp: Slot(1): Card present
pcieport 0000:00:03.0: pciehp: Slot(1): Link Up
pci 0000:01:00.0: [1b36:000e] type 01 class 0x060400 PCIe to PCI/PCI-X bridge
pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000000ff 64bit]
pci 0000:01:00.0: PCI bridge to [bus 00]
pci 0000:01:00.0: bridge window [io 0x0000-0x0fff]
pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff]
pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref]
pci 0000:01:00.0: No bus number available for hot-added bridge

(note: kernel should abort hot-plugging right here)

pci 0000:01:00.0: BAR 0 [mem 0xfe800000-0xfe8000ff 64bit]: assigned
pcieport 0000:00:03.0: PCI bridge to [bus 01]
pcieport 0000:00:03.0: bridge window [io 0x1000-0x1fff]
pcieport 0000:00:03.0: bridge window [mem 0xfe800000-0xfe9fffff]
pcieport 0000:00:03.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref]
shpchp 0000:01:00.0: HPC vendor_id 1b36 device_id e ss_vid 0 ss_did 0
shpchp 0000:01:00.0: enabling device (0000 -> 0002)
BUG: kernel NULL pointer dereference, address: 00000000000000da
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 46 Comm: irq/24-pciehp Not tainted 6.9.0-rc1-00001-g2e0239d47d75 #33
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:shpc_init+0x3fb/0x9d0
[stack dump and register dump cut out]

Fix this by aborting the hot-plug if pci_hp_add_bridge() fails.

Fixes: 0eb3bcfd088e ("[PATCH] pciehp: allow bridged card hotplug")
Signed-off-by: Nam Cao <[email protected]>
Cc: <[email protected]>
---
v2:
- remove "Cc: Rajesh Shah <[email protected]>" (this address bounce)
- add more information to commit message
- return 0 instead of -EINVAL
- remove pci_stop_and_remove_bus_device()

drivers/pci/hotplug/pciehp_pci.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index ad12515a4a12..200a7f4a12e0 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -58,8 +58,10 @@ int pciehp_configure_device(struct controller *ctrl)
goto out;
}

- for_each_pci_bridge(dev, parent)
- pci_hp_add_bridge(dev);
+ for_each_pci_bridge(dev, parent) {
+ if (pci_hp_add_bridge(dev))
+ goto out;
+ }

pci_assign_unassigned_bridge_resources(bridge);
pcie_bus_configure_settings(parent);
--
2.39.2


2024-05-05 05:45:28

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

[cc += Ilpo, Mika]

On Sat, May 04, 2024 at 06:15:22PM +0200, Nam Cao wrote:
> If a bridge is hot-added without any bus number available for its
> downstream bus, pci_hp_add_bridge() will fail. However, the driver
> proceeds regardless, and the kernel crashes.
[...]
> Fix this by aborting the hot-plug if pci_hp_add_bridge() fails.
[...]
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -58,8 +58,10 @@ int pciehp_configure_device(struct controller *ctrl)
> goto out;
> }
>
> - for_each_pci_bridge(dev, parent)
> - pci_hp_add_bridge(dev);
> + for_each_pci_bridge(dev, parent) {
> + if (pci_hp_add_bridge(dev))
> + goto out;
> + }
>
> pci_assign_unassigned_bridge_resources(bridge);
> pcie_bus_configure_settings(parent);

Are the curly braces even necessary?

FWIW, the rationale for returning 0 (success) in this case is that
pciehp has done its job by bringing up the slot and enumerating the
bridge in the slot. It's not pciehp's fault that the hierarchy
cannot be extended further below the hot-added bridge.

Have you gone through the testing steps you spoke of earlier
(replacing the hot-added bridge with an Ethernet card) and do
they work correctly with this patch?

Reviewed-by: Lukas Wunner <[email protected]>

Thanks,

Lukas

2024-05-05 07:15:10

by Nam Cao

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

On Sun, May 05, 2024 at 07:45:13AM +0200, Lukas Wunner wrote:
> On Sat, May 04, 2024 at 06:15:22PM +0200, Nam Cao wrote:
> > If a bridge is hot-added without any bus number available for its
> > downstream bus, pci_hp_add_bridge() will fail. However, the driver
> > proceeds regardless, and the kernel crashes.
> [...]
> > Fix this by aborting the hot-plug if pci_hp_add_bridge() fails.
> [...]
> > --- a/drivers/pci/hotplug/pciehp_pci.c
> > +++ b/drivers/pci/hotplug/pciehp_pci.c
> > @@ -58,8 +58,10 @@ int pciehp_configure_device(struct controller *ctrl)
> > goto out;
> > }
> >
> > - for_each_pci_bridge(dev, parent)
> > - pci_hp_add_bridge(dev);
> > + for_each_pci_bridge(dev, parent) {
> > + if (pci_hp_add_bridge(dev))
> > + goto out;
> > + }
> >
> > pci_assign_unassigned_bridge_resources(bridge);
> > pcie_bus_configure_settings(parent);
>
> Are the curly braces even necessary?

Nope. I thought this is the kernel's coding style, since checkpatch.pl
didn't complain. But checkpatch also doesn't complain after I remove it,
so no I guess that's not necessary.

> FWIW, the rationale for returning 0 (success) in this case is that
> pciehp has done its job by bringing up the slot and enumerating the
> bridge in the slot. It's not pciehp's fault that the hierarchy
> cannot be extended further below the hot-added bridge.
>
> Have you gone through the testing steps you spoke of earlier
> (replacing the hot-added bridge with an Ethernet card) and do
> they work correctly with this patch?

Yes.

> Reviewed-by: Lukas Wunner <[email protected]>

Thanks for the review. I will send v3 with the brackets removed.

Best regards,
Nam

2024-05-06 08:37:27

by Nam Cao

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

On Sun, May 05, 2024 at 09:15:00AM +0200, Nam Cao wrote:
> On Sun, May 05, 2024 at 07:45:13AM +0200, Lukas Wunner wrote:
> > Have you gone through the testing steps you spoke of earlier
> > (replacing the hot-added bridge with an Ethernet card) and do
> > they work correctly with this patch?
>
> Yes.

I just discovered a new crash scenario with shpchp:

First, hot-add a bridge:
(qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1

But with the current patch, the hot-plug is still successful, just that the
bridge is not properly configured:
$ lspci -t
-[0000:00]-+-00.0
+-01.0
+-02.0
+-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge
+-1f.0
+-1f.2
\-1f.3

But! Now I leave the hot-added bridge as is, and hot-add an ethernet card:
(qemu) device_add e1000,bus=br1,id=eth0,addr=2

this command crashes the kernel (full crash log below).

The problem is because during hot-plugging of this ethernet card,
pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't
been marked as "added" yet, so the driver attempts to add this bridge
again, leading to the crash.

Now for pciehp, this scenario cannot happen, because there is only a single
slot, so we cannot hot-plug another device. But still, this makes me think
perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured
state as this patch is doing. I cannot think of a scenario that would crash pciehp
similarly to shpchp. But that's just me, maybe there is a rare scenario
that can crash pciehp, but I just haven't seen yet.

My point is: better safe than sorry. I propose going back to the original
solution: calling pci_stop_and_remove_bus_device() and return a negative
error, and get rid of this improperly configured bridge. It is useless
anyways without a subordinate bus number.

What do you think?

Best regards,
Nam

(qemu) device_add e1000,bus=br1,id=eth0,addr=2
[ 140.536933] shpchp 0000:01:00.0: Latch close on Slot(2)
[ 140.538466] shpchp 0000:01:00.0: Button pressed on Slot(2)
[ 140.539992] shpchp 0000:01:00.0: Card present on Slot(2)
[ 140.541600] shpchp 0000:01:00.0: PCI slot #2 - powering on due to button press
[ 146.604107] pci 0000:02:02.0: [8086:100e] type 00 class 0x020000 conventional PCI endpoint
[ 146.606679] pci 0000:02:02.0: BAR 0 [mem 0x00000000-0x0001ffff]
[ 146.608438] pci 0000:02:02.0: BAR 1 [io 0x0000-0x003f]
[ 146.610340] pci 0000:02:02.0: ROM [mem 0x00000000-0x0003ffff pref]
[ 146.612326] pci 0000:02:02.0: vgaarb: pci_notify
[ 146.613730] pci 0000:02:02.0: ROM [mem 0xfe600000-0xfe63ffff pref]: assigned
[ 146.615705] pci 0000:02:02.0: BAR 0 [mem 0xfe640000-0xfe65ffff]: assigned
[ 146.617685] pci 0000:02:01.0: BAR 0 [mem 0xfe660000-0xfe6600ff 64bit]: assigned
[ 146.621397] pci 0000:02:02.0: BAR 1 [io 0xc000-0xc03f]: assigned
[ 146.623184] shpchp 0000:01:00.0: PCI bridge to [bus 02]
[ 146.624704] shpchp 0000:01:00.0: bridge window [io 0xc000-0xcfff]
[ 146.628853] shpchp 0000:01:00.0: bridge window [mem 0xfe600000-0xfe7fffff]
[ 146.632415] shpchp 0000:01:00.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref]
[ 146.637717] pcieport 0000:02:01.0: vgaarb: pci_notify
[ 146.639172] pcieport 0000:02:01.0: runtime IRQ mapping not provided by arch
[ 146.641635] pcieport 0000:02:01.0: vgaarb: pci_notify
[ 146.643093] shpchp 0000:02:01.0: vgaarb: pci_notify
[ 146.644501] shpchp 0000:02:01.0: runtime IRQ mapping not provided by arch
[ 146.647967] shpchp 0000:02:01.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 ss_did 0
[ 146.650106] shpchp 0000:02:01.0: enabling device (0000 -> 0002)
[ 146.653840] ACPI: \_SB_.GSIE: Enabled at IRQ 20
[ 146.656699] shpchp 0000:02:01.0: enabling bus mastering
[ 146.659567] BUG: kernel NULL pointer dereference, address: 00000000000000da
[ 146.661532] #PF: supervisor write access in kernel mode
[ 146.663006] #PF: error_code(0x0002) - not-present page
[ 146.664453] PGD 0 P4D 0
[ 146.665208] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 146.666447] CPU: 1 PID: 35 Comm: kworker/1:1 Not tainted 6.9.0-rc1-00003-g5ef667cf77c0 #59
[ 146.668743] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 146.671314] Workqueue: shpchp-2 shpchp_pushbutton_thread
[ 146.672817] RIP: 0010:shpc_init+0x3fb/0x9d0
[ 146.674282] Code: 8b 48 08 40 80 ff 02 0f 84 15 04 00 00 f7 c2 00 00 00 1f 0f 84 44 02 00 00 b8 04 00 00 00 b9 04 00 0f
[ 146.679118] RSP: 0018:ffffc90000133ab0 EFLAGS: 00010246
[ 146.680245] RAX: 0000000000000000 RBX: ffff8880056fd600 RCX: 0000000000000000
[ 146.681782] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffffffff83059101
[ 146.683307] RBP: ffffc90000133af8 R08: ffff88800364f500 R09: 0000000000000000
[ 146.684831] R10: 0000000000000000 R11: ffff88800470b840 R12: ffff888006af7000
[ 146.686361] R13: 0000000000000000 R14: 000000007f000d0f R15: 000000000000001f
[ 146.687882] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 146.689597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 146.690821] CR2: 00000000000000da CR3: 000000000432a000 CR4: 00000000000006f0
[ 146.692331] Call Trace:
[ 146.692869] <TASK>
[ 146.693351] ? show_regs+0x63/0x70
[ 146.694097] ? __die+0x23/0x70
[ 146.694771] ? page_fault_oops+0x17a/0x470
[ 146.695659] ? search_module_extables+0x18/0x60
[ 146.696644] ? shpc_init+0x3fb/0x9d0
[ 146.697435] ? kernelmode_fixup_or_oops+0x9b/0x120
[ 146.698474] ? __bad_area_nosemaphore+0x16e/0x240
[ 146.699495] ? bad_area_nosemaphore+0x11/0x20
[ 146.700442] ? do_user_addr_fault+0x2a8/0x610
[ 146.701405] ? exc_page_fault+0x6d/0x160
[ 146.702263] ? asm_exc_page_fault+0x2b/0x30
[ 146.703171] ? shpc_init+0x3fb/0x9d0
[ 146.704213] shpc_probe+0x92/0x390
[ 146.704880] local_pci_probe+0x46/0xa0
[ 146.705626] pci_device_probe+0xb0/0x190
[ 146.706386] really_probe+0xc7/0x390
[ 146.707080] ? __pfx___device_attach_driver+0x10/0x10
[ 146.708044] __driver_probe_device+0x78/0x150
[ 146.708880] driver_probe_device+0x1f/0x90
[ 146.709679] __device_attach_driver+0x93/0x120
[ 146.710538] bus_for_each_drv+0x96/0xf0
[ 146.711277] __device_attach+0xb1/0x1c0
[ 146.712018] device_attach+0xf/0x20
[ 146.712695] pci_bus_add_device+0x58/0x90
[ 146.713480] pci_bus_add_devices+0x30/0x70
[ 146.714292] shpchp_configure_device+0xd8/0x150
[ 146.715200] board_added+0x115/0x420
[ 146.715898] shpchp_enable_slot+0x114/0x3b0
[ 146.716711] shpchp_pushbutton_thread+0x77/0xa0
[ 146.717605] process_one_work+0x153/0x390
[ 146.718387] worker_thread+0x2c9/0x400
[ 146.719111] ? __pfx_worker_thread+0x10/0x10
[ 146.719945] kthread+0xd7/0x100
[ 146.720566] ? __pfx_kthread+0x10/0x10
[ 146.721306] ret_from_fork+0x3c/0x60
[ 146.722006] ? __pfx_kthread+0x10/0x10
[ 146.722734] ret_from_fork_asm+0x1a/0x30
[ 146.723498] </TASK>
[ 146.723931] Modules linked in:
[ 146.724529] CR2: 00000000000000da
[ 146.725182] ---[ end trace 0000000000000000 ]---
[ 146.726065] RIP: 0010:shpc_init+0x3fb/0x9d0
[ 146.726862] Code: 8b 48 08 40 80 ff 02 0f 84 15 04 00 00 f7 c2 00 00 00 1f 0f 84 44 02 00 00 b8 04 00 00 00 b9 04 00 0f
[ 146.730372] RSP: 0018:ffffc90000133ab0 EFLAGS: 00010246
[ 146.731363] RAX: 0000000000000000 RBX: ffff8880056fd600 RCX: 0000000000000000
[ 146.732714] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffffffff83059101
[ 146.734072] RBP: ffffc90000133af8 R08: ffff88800364f500 R09: 0000000000000000
[ 146.735429] R10: 0000000000000000 R11: ffff88800470b840 R12: ffff888006af7000
[ 146.736779] R13: 0000000000000000 R14: 000000007f000d0f R15: 000000000000001f
[ 146.738143] FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 146.739673] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 146.740764] CR2: 00000000000000da CR3: 000000000432a000 CR4: 00000000000006f0
[ 146.742116] note: kworker/1:1[35] exited with irqs disabled
[ 146.743232] kworker/1:1 (35) used greatest stack depth: 12352 bytes left

2024-05-06 19:37:05

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote:
> I just discovered a new crash scenario with shpchp:
>
> First, hot-add a bridge:
> (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
>
> But with the current patch, the hot-plug is still successful, just that the
> bridge is not properly configured:
> $ lspci -t
> -[0000:00]-+-00.0
> +-01.0
> +-02.0
> +-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge
> +-1f.0
> +-1f.2
> \-1f.3
>
> But! Now I leave the hot-added bridge as is, and hot-add an ethernet card:
> (qemu) device_add e1000,bus=br1,id=eth0,addr=2
>
> this command crashes the kernel (full crash log below).
>
> The problem is because during hot-plugging of this ethernet card,
> pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't
> been marked as "added" yet, so the driver attempts to add this bridge
> again, leading to the crash.
>
> Now for pciehp, this scenario cannot happen, because there is only a single
> slot, so we cannot hot-plug another device. But still, this makes me think
> perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured
> state as this patch is doing. I cannot think of a scenario that would crash pciehp
> similarly to shpchp. But that's just me, maybe there is a rare scenario
> that can crash pciehp, but I just haven't seen yet.
>
> My point is: better safe than sorry. I propose going back to the original
> solution: calling pci_stop_and_remove_bus_device() and return a negative
> error, and get rid of this improperly configured bridge. It is useless
> anyways without a subordinate bus number.
>
> What do you think?

When the kernel runs out of BAR address space for devices downstream of
a bridge, it doesn't de-enumerate the bridge. The devices are just
unusable.

So my first intuition is that running out of bus numbers shouldn't result
in de-enumeration either.

Remind me, how exactly does the NULL pointer deref occur? I think it's
because no struct pci_bus was allocated for the subordinate bus of the
hot-plugged bridge, right? Because AFAICS that would happen in

pci_hp_add_bridge()
pci_can_bridge_extend()
pci_add_new_bus()
pci_alloc_child_bus()

but we never get that far because pci_hp_add_bridge() bails out with -1.
So the subordinate pointer remains a NULL pointer.

Perhaps we should allocate an empty struct pci_bus instead.
Or check for a NULL subordinate pointer instead of crashing.

I can't really say off the top of my head which solution is appropriate
here, it requires going through the code paths and identifying the
complexity associated with each solution.

Thanks,

Lukas

2024-05-07 14:28:16

by Nam Cao

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote:
> On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote:
> > I just discovered a new crash scenario with shpchp:
> >
> > First, hot-add a bridge:
> > (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
> >
> > But with the current patch, the hot-plug is still successful, just that the
> > bridge is not properly configured:
> > $ lspci -t
> > -[0000:00]-+-00.0
> > +-01.0
> > +-02.0
> > +-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge
> > +-1f.0
> > +-1f.2
> > \-1f.3
> >
> > But! Now I leave the hot-added bridge as is, and hot-add an ethernet card:
> > (qemu) device_add e1000,bus=br1,id=eth0,addr=2
> >
> > this command crashes the kernel (full crash log below).
> >
> > The problem is because during hot-plugging of this ethernet card,
> > pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't
> > been marked as "added" yet, so the driver attempts to add this bridge
> > again, leading to the crash.
> >
> > Now for pciehp, this scenario cannot happen, because there is only a single
> > slot, so we cannot hot-plug another device. But still, this makes me think
> > perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured
> > state as this patch is doing. I cannot think of a scenario that would crash pciehp
> > similarly to shpchp. But that's just me, maybe there is a rare scenario
> > that can crash pciehp, but I just haven't seen yet.
> >
> > My point is: better safe than sorry. I propose going back to the original
> > solution: calling pci_stop_and_remove_bus_device() and return a negative
> > error, and get rid of this improperly configured bridge. It is useless
> > anyways without a subordinate bus number.
> >
> > What do you think?
>
> When the kernel runs out of BAR address space for devices downstream of
> a bridge, it doesn't de-enumerate the bridge. The devices are just
> unusable.
>
> So my first intuition is that running out of bus numbers shouldn't result
> in de-enumeration either.

The BAR case is a bit different: it is a legal configuration for a bridge
to have no address space downstream: we can configure the bridge with
limit<base to indicate that there is no downstream addresses. There is
nothing similar for secondary bus number: there is no legal bus number
configuration for the bridge in this case.

> Remind me, how exactly does the NULL pointer deref occur? I think it's
> because no struct pci_bus was allocated for the subordinate bus of the
> hot-plugged bridge, right? Because AFAICS that would happen in
>
> pci_hp_add_bridge()
> pci_can_bridge_extend()
> pci_add_new_bus()
> pci_alloc_child_bus()
>
> but we never get that far because pci_hp_add_bridge() bails out with -1.
> So the subordinate pointer remains a NULL pointer.

This is correct. NULL deference happens due to subordinate pointer being
NULL.

> Perhaps we should allocate an empty struct pci_bus instead.

This feels a bit hacky. What bus number could we set to this struct? And I
doubt that the PCI subsystem is written with the expectation that struct pci_bus
can be a dummy one, so I would guess that the probability of breaking thing
is high with this approach.

> Or check for a NULL subordinate pointer instead of crashing.

I think this is a possible solution, but it is a bit complicated: all usage
of subordinate pointers will need to be looked at. Further more, secondary bus
number being zero is currently taken as unconfigured/invalid (for example
in pci_scan_bridge_extend()), that needs to be changed as well. Doing this
has a good chance of breaking things.

I don't really see the upside of the above, compared to just deleting this
bridge. It is not really counter-intuitive that the OS rejects a
hot-plugged device that cannot be configured, right? Also this solution is
wayyy simpler. Unless there is problem with this that I am not seeing?

Best regards,
Nam

2024-05-27 09:16:23

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

On Tue, May 07, 2024 at 04:27:38PM +0200, Nam Cao wrote:
> On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote:
> > Remind me, how exactly does the NULL pointer deref occur? I think it's
> > because no struct pci_bus was allocated for the subordinate bus of the
> > hot-plugged bridge, right? Because AFAICS that would happen in
> >
> > pci_hp_add_bridge()
> > pci_can_bridge_extend()
> > pci_add_new_bus()
> > pci_alloc_child_bus()
> >
> > but we never get that far because pci_hp_add_bridge() bails out with -1.
> > So the subordinate pointer remains a NULL pointer.
>
> This is correct. NULL deference happens due to subordinate pointer being
> NULL.
>
> > Or check for a NULL subordinate pointer instead of crashing.
>
> I think this is a possible solution, but it is a bit complicated: all usage
> of subordinate pointers will need to be looked at.

We already check for a NULL subordinate pointer in various places.
See e.g. commit 62e4492c3063 ("PCI: Prevent NULL dereference during
pciehp probe").

If we're missing such checks, I'd suggest to add those.

If you believe having a NULL subordinate pointer is wrong and the
bridge should be de-enumerated altogether, I think you would have
to remove these NULL pointer checks as they'd otherwise become
pointless with your change.

Just adding missing NULL pointer checks seems to be the most
straightforward solution to me.

Thanks,

Lukas

2024-05-27 09:23:38

by Nam Cao

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

On Mon, May 27, 2024 at 11:15:55AM +0200, Lukas Wunner wrote:
> On Tue, May 07, 2024 at 04:27:38PM +0200, Nam Cao wrote:
> > On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote:
> > > Remind me, how exactly does the NULL pointer deref occur? I think it's
> > > because no struct pci_bus was allocated for the subordinate bus of the
> > > hot-plugged bridge, right? Because AFAICS that would happen in
> > >
> > > pci_hp_add_bridge()
> > > pci_can_bridge_extend()
> > > pci_add_new_bus()
> > > pci_alloc_child_bus()
> > >
> > > but we never get that far because pci_hp_add_bridge() bails out with -1.
> > > So the subordinate pointer remains a NULL pointer.
> >
> > This is correct. NULL deference happens due to subordinate pointer being
> > NULL.
> >
> > > Or check for a NULL subordinate pointer instead of crashing.
> >
> > I think this is a possible solution, but it is a bit complicated: all usage
> > of subordinate pointers will need to be looked at.
>
> We already check for a NULL subordinate pointer in various places.
> See e.g. commit 62e4492c3063 ("PCI: Prevent NULL dereference during
> pciehp probe").

Ah, so bridge without subordinate bus is allowed in the kernel.

> If we're missing such checks, I'd suggest to add those.
>
> If you believe having a NULL subordinate pointer is wrong and the
> bridge should be de-enumerated altogether, I think you would have
> to remove these NULL pointer checks as they'd otherwise become
> pointless with your change.
>
> Just adding missing NULL pointer checks seems to be the most
> straightforward solution to me.

If the kernel do permits bridges without subordinate bus number, I am
happy to go this direction. I expect going this way will require many more
patches, I will dig into it and come back later.

Thanks for your patience providing me advices & information.

Best regards,
Nam




2024-05-27 12:42:39

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

On Mon, May 27, 2024 at 11:23:22AM +0200, Nam Cao wrote:
> On Mon, May 27, 2024 at 11:15:55AM +0200, Lukas Wunner wrote:
> > We already check for a NULL subordinate pointer in various places.
> > See e.g. commit 62e4492c3063 ("PCI: Prevent NULL dereference during
> > pciehp probe").
>
> Ah, so bridge without subordinate bus is allowed in the kernel.
>
> > If we're missing such checks, I'd suggest to add those.
> >
> > If you believe having a NULL subordinate pointer is wrong and the
> > bridge should be de-enumerated altogether, I think you would have
> > to remove these NULL pointer checks as they'd otherwise become
> > pointless with your change.
> >
> > Just adding missing NULL pointer checks seems to be the most
> > straightforward solution to me.
>
> If the kernel do permits bridges without subordinate bus number, I am
> happy to go this direction. I expect going this way will require many more
> patches, I will dig into it and come back later.

It seems a lot of functions use a "if (dev->subordinate)" check
to distinguish between bridges and endpoints.

A bridge with a NULL subordinate pointer is then basically treated
like an endpoint.

There are places which use other ways to recognize bridges, e.g.
by looking at dev->hdr_type. Only these code paths will have to
check for a NULL subordinate pointer. In the case of pciehp,
portdrv binds to anything with class PCI_CLASS_BRIDGE_PCI_NORMAL,
without consideration for the subordinate pointer, hence a check
was needed in pciehp.

pciehp is used a lot by Thunderbolt/USB4, SD 7.0 and for NVMe drives.
I think shpchp isn't used as much, so its code paths aren't exercised
as heavily and bugs aren't found and fixed as quickly. So chances are
that more checks are missing in shpchp than in pciehp.

Thanks,

Lukas