2023-08-13 03:57:17

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH] PCI: hv: Fix a crash in hv_pci_restore_msi_msg() during hibernation

For a Linux VM with a NVIDIA GPU running on Hyper-V, before the GPU driver
is installed, hibernating the VM will trigger a panic: if the GPU driver
is not installed and loaded, MSI-X/MSI is not enabled on the device, so
pdev->dev.msi.data is NULL, and msi_lock_descs(&pdev->dev) causes the
NULL pointer dereference. Fix this by checking pdev->dev.msi.data.

Fixes: dc2b453290c4 ("PCI: hv: Rework MSI handling")
Signed-off-by: Dexuan Cui <[email protected]>
---
drivers/pci/controller/pci-hyperv.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 2d93d0c4f10d..fdd01bfb8e10 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3983,6 +3983,9 @@ static int hv_pci_restore_msi_msg(struct pci_dev *pdev, void *arg)
struct msi_desc *entry;
int ret = 0;

+ if (!pdev->dev.msi.data)
+ return 0;
+
msi_lock_descs(&pdev->dev);
msi_for_each_desc(entry, &pdev->dev, MSI_DESC_ASSOCIATED) {
irq_data = irq_get_irq_data(entry->irq);
--
2.25.1



2023-08-19 09:30:00

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] PCI: hv: Fix a crash in hv_pci_restore_msi_msg() during hibernation

> From: Michael Kelley (LINUX) <[email protected]>
> Sent: Tuesday, August 15, 2023 5:35 PM
> > [...]
> > For a Linux VM with a NVIDIA GPU running on Hyper-V, before the GPU
> driver
> > is installed, hibernating the VM will trigger a panic: if the GPU driver
> > is not installed and loaded, MSI-X/MSI is not enabled on the device, so
> > pdev->dev.msi.data is NULL, and msi_lock_descs(&pdev->dev) causes the
> > NULL pointer dereference. Fix this by checking pdev->dev.msi.data.
>
> Is the scenario here a little broader than just the NVIDIA GPU driver? For
> any virtual PCI device that is presented in the guest VM as a VMBus device,
> the driver might not be installed. There could have been some initial
> problem getting the driver installed, or it might have been manually
> uninstalled later in the life of the VM. Also the host might have rescinded
> the virtual PCI device and added it back later, creating another opportunity
> where the driver might not be loaded. In any case, it seems like we could
> have the VMBus aspects of the device setup, but not the driver for the
> device. This suspend/resume code in pci-hyperv.c is all about handling
> the VMBus aspects of the device anyway.

Good point! The bug also affects other PCI devices, e.g. if I unload mlx5_core
and let the VM with a Mellanox VF NIC hibernate, I hit the same NULL
pointer dereference.

> Assuming my thinking is correct, is there some Hyper-V/VMBus setting
> owned by the pci-hyperv.c driver that would be better to test here than
> the low-level dev.msi.data pointer? The MSI code rework that added

IMO there is no easy and reliable way in Hyper-V/VMBus/pci-hyperv to
tell if MSI/MSI-X is enabled for a PCI device. We can potentially track the
MSI/MSI-X irqs in hv_compose_msi_msg() and hv_irq_unmask(), but
IMO that's not very easy and may be inaccurate.

> the descriptor lock encapsulates the internals with appropriate accessor
> functions, and reaching in to directly test dev.msi.data violates that
> encapsulation.

I agree.

Compared with:
if (!pdev->dev.msi.data)
return 0;

I think it's better to use this:
if (!pdev->msi_enabled && !pdev->msix_enabled)
return 0;

pdev-> msix_enabled has been used in many drivers, e.g.

"arch/x86/pci/xen.c": xen_pv_teardown_msi_irqs()
"drivers/hid/intel-ish-hid/ipc/pci-ish.c": ish_probe()
"drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c": pvrdma_intr0_handler()
"drivers/scsi/vmw_pvscsi.c": pvscsi_probe()
and more.

So it looks like pdev-> msix_enabled is a legit and stable API.
I'll post v2 with it. I'll update the changelog accordingly.
Please let me know if you have concerns about it.