2022-04-25 13:25:17

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH v3 0/8] vfio/pci: power management changes

Currently, there is very limited power management support available
in the upstream vfio-pci driver. If there is no user of vfio-pci device,
then it will be moved into D3Hot state. Similarly, if we enable the
runtime power management for vfio-pci device in the guest OS, then the
device is being runtime suspended (for linux guest OS) and the PCI
device will be put into D3hot state (in function
vfio_pm_config_write()). If the D3cold state can be used instead of
D3hot, then it will help in saving maximum power. The D3cold state can't
be possible with native PCI PM. It requires interaction with platform
firmware which is system-specific. To go into low power states
(including D3cold), the runtime PM framework can be used which
internally interacts with PCI and platform firmware and puts the device
into the lowest possible D-States. This patch series registers the
vfio-pci driver with runtime PM framework and uses the same for moving
the physical PCI device to go into the low power state.

The current PM support was added with commit 6eb7018705de ("vfio-pci:
Move idle devices to D3hot power state") where the following point was
mentioned regarding D3cold state.

"It's tempting to try to use D3cold, but we have no reason to inhibit
hotplug of idle devices and we might get into a loop of having the
device disappear before we have a chance to try to use it."

With the runtime PM, if the user want to prevent going into D3cold then
/sys/bus/pci/devices/.../d3cold_allowed can be set to 0 for the
devices where the above functionality is required instead of
disallowing the D3cold state for all the cases.

Since D3cold state can't be achieved by writing PCI standard PM
config registers, so a feature has been added in DEVICE_FEATURE IOCTL
for low power related handling, which changes the PCI
device from D3hot to D3cold state and then D3cold to D0 state.
The hypervisors can implement virtual ACPI methods. For example,
in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power
resources with _ON/_OFF method, then guest linux OS makes the _OFF call
during D3cold transition and then _ON during D0 transition. The
hypervisor can tap these virtual ACPI calls and then do the D3cold
related IOCTL in vfio driver.

The BAR access needs to be disabled if device is in D3hot state.
Also, there should not be any config access if device is in D3cold
state. For SR-IOV, the PF power state should be higher than VF's power
state.

* Changes in v3

- Rebased patches on v5.18-rc3.
- Marked this series as PATCH instead of RFC.
- Addressed the review comments given in v2.
- Removed the limitation to keep device in D0 state if there is any
access from host side. This is specific to NVIDIA use case and
will be handled separately.
- Used the existing DEVICE_FEATURE IOCTL itself instead of adding new
IOCTL for power management.
- Removed all custom code related with power management in runtime
suspend/resume callbacks and IOCTL handling. Now, the callbacks
contain code related with INTx handling and few other stuffs and
all the PCI state and platform PM handling will be done by PCI core
functions itself.
- Add the support of wake-up in main vfio layer itself since now we have
more vfio/pci based drivers.
- Instead of assigning the 'struct dev_pm_ops' in individual parent
driver, now the vfio_pci_core tself assigns the 'struct dev_pm_ops'.
- Added handling of power management around SR-IOV handling.
- Moved the setting of drvdata in a separate patch.
- Masked INTx before during runtime suspended state.
- Changed the order of patches so that Fix related things are at beginning
of this patch series.
- Removed storing the power state locally and used one new boolean to
track the d3 (D3cold and D3hot) power state
- Removed check for IO access in D3 power state.
- Used another helper function vfio_lock_and_set_power_state() instead
of touching vfio_pci_set_power_state().
- Considered the fixes made in
https://lore.kernel.org/lkml/[email protected]
and updated the patches accordingly.

* Changes in v2
(https://lore.kernel.org/lkml/[email protected])

- Rebased patches on v5.17-rc1.
- Included the patch to handle BAR access in D3cold.
- Included the patch to fix memory leak.
- Made a separate IOCTL that can be used to change the power state from
D3hot to D3cold and D3cold to D0.
- Addressed the review comments given in v1.

* v1
https://lore.kernel.org/lkml/[email protected]/

Abhishek Sahu (8):
vfio/pci: Invalidate mmaps and block the access in D3hot power state
vfio/pci: Change the PF power state to D0 before enabling VFs
vfio/pci: Virtualize PME related registers bits and initialize to zero
vfio/pci: Add support for setting driver data inside core layer
vfio/pci: Enable runtime PM for vfio_pci_core based drivers
vfio: Invoke runtime PM API for IOCTL request
vfio/pci: Mask INTx during runtime suspend
vfio/pci: Add the support for PCI D3cold state

.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 4 +-
drivers/vfio/pci/mlx5/main.c | 3 +-
drivers/vfio/pci/vfio_pci.c | 4 +-
drivers/vfio/pci/vfio_pci_config.c | 63 ++-
drivers/vfio/pci/vfio_pci_core.c | 358 +++++++++++++++---
drivers/vfio/pci/vfio_pci_intrs.c | 6 +-
drivers/vfio/pci/vfio_pci_rdwr.c | 6 +-
drivers/vfio/vfio.c | 44 ++-
include/linux/vfio_pci_core.h | 12 +-
include/uapi/linux/vfio.h | 18 +
10 files changed, 445 insertions(+), 73 deletions(-)

--
2.17.1


2022-04-25 13:31:06

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH v3 1/8] vfio/pci: Invalidate mmaps and block the access in D3hot power state

According to [PCIe v5 5.3.1.4.1] for D3hot state

"Configuration and Message requests are the only TLPs accepted by a
Function in the D3Hot state. All other received Requests must be
handled as Unsupported Requests, and all received Completions may
optionally be handled as Unexpected Completions."

Currently, if the vfio PCI device has been put into D3hot state and if
user makes non-config related read/write request in D3hot state, these
requests will be forwarded to the host and this access may cause
issues on a few systems.

This patch leverages the memory-disable support added in commit
'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on
disabled memory")' to generate page fault on mmap access and
return error for the direct read/write. If the device is D3hot state,
then the error will be returned for MMIO access. The IO access generally
does not make the system unresponsive so the IO access can still happen
in D3hot state. The default value should be returned in this case
without bringing down the complete system.

Also, the power related structure fields need to be protected so
we can use the same 'memory_lock' to protect these fields also.
This protection is mainly needed when user changes the PCI
power state by writing into PCI_PM_CTRL register.
vfio_lock_and_set_power_state() wrapper function will take the
required locks and then it will invoke the vfio_pci_set_power_state().

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/vfio/pci/vfio_pci_config.c | 19 ++++++++++++++++++-
drivers/vfio/pci/vfio_pci_core.c | 4 +++-
drivers/vfio/pci/vfio_pci_rdwr.c | 6 ++++--
include/linux/vfio_pci_core.h | 1 +
4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 6e58b4bf7a60..dd557edae6e1 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -692,6 +692,23 @@ static int __init init_pci_cap_basic_perm(struct perm_bits *perm)
return 0;
}

+/*
+ * It takes all the required locks to protect the access of power related
+ * variables and then invokes vfio_pci_set_power_state().
+ */
+static void
+vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev,
+ pci_power_t state)
+{
+ if (state >= PCI_D3hot)
+ vfio_pci_zap_and_down_write_memory_lock(vdev);
+ else
+ down_write(&vdev->memory_lock);
+
+ vfio_pci_set_power_state(vdev, state);
+ up_write(&vdev->memory_lock);
+}
+
static int vfio_pm_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
@@ -718,7 +735,7 @@ static int vfio_pm_config_write(struct vfio_pci_core_device *vdev, int pos,
break;
}

- vfio_pci_set_power_state(vdev, state);
+ vfio_lock_and_set_power_state(vdev, state);
}

return count;
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 06b6f3594a13..f3dfb033e1c4 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -230,6 +230,8 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
ret = pci_set_power_state(pdev, state);

if (!ret) {
+ vdev->power_state_d3 = (pdev->current_state >= PCI_D3hot);
+
/* D3 might be unsupported via quirk, skip unless in D3 */
if (needs_save && pdev->current_state >= PCI_D3hot) {
/*
@@ -1398,7 +1400,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
mutex_lock(&vdev->vma_lock);
down_read(&vdev->memory_lock);

- if (!__vfio_pci_memory_enabled(vdev)) {
+ if (!__vfio_pci_memory_enabled(vdev) || vdev->power_state_d3) {
ret = VM_FAULT_SIGBUS;
goto up_out;
}
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 82ac1569deb0..fac6bb40a4ce 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -43,7 +43,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev, \
{ \
if (test_mem) { \
down_read(&vdev->memory_lock); \
- if (!__vfio_pci_memory_enabled(vdev)) { \
+ if (!__vfio_pci_memory_enabled(vdev) || \
+ vdev->power_state_d3) { \
up_read(&vdev->memory_lock); \
return -EIO; \
} \
@@ -70,7 +71,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev, \
{ \
if (test_mem) { \
down_read(&vdev->memory_lock); \
- if (!__vfio_pci_memory_enabled(vdev)) { \
+ if (!__vfio_pci_memory_enabled(vdev) || \
+ vdev->power_state_d3) { \
up_read(&vdev->memory_lock); \
return -EIO; \
} \
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 48f2dd3c568c..505b2a74a479 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -124,6 +124,7 @@ struct vfio_pci_core_device {
bool needs_reset;
bool nointx;
bool needs_pm_restore;
+ bool power_state_d3;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
int ioeventfds_nr;
--
2.17.1

2022-04-25 13:36:23

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH v3 4/8] vfio/pci: Add support for setting driver data inside core layer

The vfio driver is divided into two layers: core layer (implemented in
vfio_pci_core.c) and parent driver (For example, vfio_pci, mlx5_vfio_pci,
hisi_acc_vfio_pci, etc.). All the parent driver calls dev_set_drvdata()
and assigns its own structure as driver data. Some of the callback
functions are implemented in the core layer and these callback functions
provide the reference of 'struct pci_dev' or 'struct device'. Currently,
we use vfio_device_get_from_dev() which provides reference to the
vfio_device for a device. But this function follows long path to extract
the same. There are few cases, where we don't need to go through this
long path if we get this through drvdata.

This patch moves the setting of drvdata inside the core layer. If we see
the current implementation of parent driver structure implementation,
then 'struct vfio_pci_core_device' is a first member so the pointer of
the parent structure and 'struct vfio_pci_core_device' should be the same.

struct hisi_acc_vf_core_device {
struct vfio_pci_core_device core_device;
...
};

struct mlx5vf_pci_core_device {
struct vfio_pci_core_device core_device;
...
};

The vfio_pci.c uses 'struct vfio_pci_core_device' itself.

To support getting the drvdata in both the layers, we can put the
restriction to make 'struct vfio_pci_core_device' as a first member.
Also, vfio_pci_core_register_device() has this validation which makes sure
that this prerequisite is always satisfied.

Signed-off-by: Abhishek Sahu <[email protected]>
---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 4 ++--
drivers/vfio/pci/mlx5/main.c | 3 +--
drivers/vfio/pci/vfio_pci.c | 4 ++--
drivers/vfio/pci/vfio_pci_core.c | 24 ++++++++++++++++---
include/linux/vfio_pci_core.h | 7 +++++-
5 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 767b5d47631a..c76c09302a8f 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1274,11 +1274,11 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
&hisi_acc_vfio_pci_ops);
}

- ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
+ ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device,
+ hisi_acc_vdev);
if (ret)
goto out_free;

- dev_set_drvdata(&pdev->dev, hisi_acc_vdev);
return 0;

out_free:
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index bbec5d288fee..8689248f66f3 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -614,11 +614,10 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
}
}

- ret = vfio_pci_core_register_device(&mvdev->core_device);
+ ret = vfio_pci_core_register_device(&mvdev->core_device, mvdev);
if (ret)
goto out_free;

- dev_set_drvdata(&pdev->dev, mvdev);
return 0;

out_free:
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2b047469e02f..e0f8027c5cd8 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -151,10 +151,10 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return -ENOMEM;
vfio_pci_core_init_device(vdev, pdev, &vfio_pci_ops);

- ret = vfio_pci_core_register_device(vdev);
+ ret = vfio_pci_core_register_device(vdev, vdev);
if (ret)
goto out_free;
- dev_set_drvdata(&pdev->dev, vdev);
+
return 0;

out_free:
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1271728a09db..953ac33b2f5f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1822,9 +1822,11 @@ void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev)
}
EXPORT_SYMBOL_GPL(vfio_pci_core_uninit_device);

-int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
+int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
+ void *driver_data)
{
struct pci_dev *pdev = vdev->pdev;
+ struct device *dev = &pdev->dev;
int ret;

if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
@@ -1843,6 +1845,17 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
return -EBUSY;
}

+ /*
+ * The 'struct vfio_pci_core_device' should be the first member of the
+ * of the structure referenced by 'driver_data' so that it can be
+ * retrieved with dev_get_drvdata() inside vfio-pci core layer.
+ */
+ if ((struct vfio_pci_core_device *)driver_data != vdev) {
+ pci_warn(pdev, "Invalid driver data\n");
+ return -EINVAL;
+ }
+ dev_set_drvdata(dev, driver_data);
+
if (pci_is_root_bus(pdev->bus)) {
ret = vfio_assign_device_set(&vdev->vdev, vdev);
} else if (!pci_probe_reset_slot(pdev->slot)) {
@@ -1856,10 +1869,10 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
}

if (ret)
- return ret;
+ goto out_drvdata;
ret = vfio_pci_vf_init(vdev);
if (ret)
- return ret;
+ goto out_drvdata;
ret = vfio_pci_vga_init(vdev);
if (ret)
goto out_vf;
@@ -1890,6 +1903,8 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
vfio_pci_set_power_state(vdev, PCI_D0);
out_vf:
vfio_pci_vf_uninit(vdev);
+out_drvdata:
+ dev_set_drvdata(dev, NULL);
return ret;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_register_device);
@@ -1897,6 +1912,7 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_register_device);
void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
+ struct device *dev = &pdev->dev;

vfio_pci_core_sriov_configure(pdev, 0);

@@ -1907,6 +1923,8 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)

if (!disable_idle_d3)
vfio_pci_set_power_state(vdev, PCI_D0);
+
+ dev_set_drvdata(dev, NULL);
}
EXPORT_SYMBOL_GPL(vfio_pci_core_unregister_device);

diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 505b2a74a479..3c7d65e68340 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -225,7 +225,12 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev);
void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
struct pci_dev *pdev,
const struct vfio_device_ops *vfio_pci_ops);
-int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev);
+/*
+ * The 'struct vfio_pci_core_device' should be the first member
+ * of the structure referenced by 'driver_data'.
+ */
+int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
+ void *driver_data);
void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev);
void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev);
int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn);
--
2.17.1

2022-04-25 15:04:20

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH v3 2/8] vfio/pci: Change the PF power state to D0 before enabling VFs

According to [PCIe v5 9.6.2] for PF Device Power Management States

"The PF's power management state (D-state) has global impact on its
associated VFs. If a VF does not implement the Power Management
Capability, then it behaves as if it is in an equivalent
power state of its associated PF.

If a VF implements the Power Management Capability, the Device behavior
is undefined if the PF is placed in a lower power state than the VF.
Software should avoid this situation by placing all VFs in lower power
state before lowering their associated PF's power state."

From the vfio driver side, user can enable SR-IOV when the PF is in D3hot
state. If VF does not implement the Power Management Capability, then
the VF will be actually in D3hot state and then the VF BAR access will
fail. If VF implements the Power Management Capability, then VF will
assume that its current power state is D0 when the PF is D3hot and
in this case, the behavior is undefined.

To support PF power management, we need to create power management
dependency between PF and its VF's. The runtime power management support
may help with this where power management dependencies are supported
through device links. But till we have such support in place, we can
disallow the PF to go into low power state, if PF has VF enabled.
There can be a case, where user first enables the VF's and then
disables the VF's. If there is no user of PF, then the PF can put into
D3hot state again. But with this patch, the PF will still be in D0
state after disabling VF's since detecting this case inside
vfio_pci_core_sriov_configure() requires access to
struct vfio_device::open_count along with its locks. But the subsequent
patches related with runtime PM will handle this case since runtime PM
maintains its own usage count.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index f3dfb033e1c4..1271728a09db 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -217,6 +217,10 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
bool needs_restore = false, needs_save = false;
int ret;

+ /* Prevent changing power state for PFs with VFs enabled */
+ if (pci_num_vf(pdev) && state > PCI_D0)
+ return -EBUSY;
+
if (vdev->needs_pm_restore) {
if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) {
pci_save_state(pdev);
@@ -1959,6 +1963,13 @@ int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
}
list_add_tail(&vdev->sriov_pfs_item, &vfio_pci_sriov_pfs);
mutex_unlock(&vfio_pci_sriov_pfs_mutex);
+
+ /*
+ * The PF power state should always be higher than the VF power
+ * state. If PF is in the low power state, then change the
+ * power state to D0 first before enabling SR-IOV.
+ */
+ vfio_pci_set_power_state(vdev, PCI_D0);
ret = pci_enable_sriov(pdev, nr_virtfn);
if (ret)
goto out_del;
--
2.17.1

2022-04-25 16:20:40

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH v3 7/8] vfio/pci: Mask INTx during runtime suspend

This patch just adds INTx handling during runtime suspend/resume and
all the suspend/resume related code for the user to put device into
low power state will be added in subsequent patches.

The INTx are shared among devices. Whenever any INTx interrupt comes
for the vfio devices, then vfio_intx_handler() will be called for each
device. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx()
and checks if the interrupt has been generated for the current device.
Now, if the device is already in D3cold state, then the config space
can not be read. Attempt to read config space in D3cold state can
cause system unresponsiveness in few systems. To Prevent this, mask
INTx in runtime suspend callback and unmask the same in runtime resume
callback. If INTx has been already masked, then no handling is needed
in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this and
vfio_pci_intx_mask() has been updated to return true if INTx has been
masked inside this function.

For the runtime suspend which is triggered for the no user of vfio
device, the is_intx() will return false and these callbacks won't do
anything.

The MSI/MSI-X are not shared so no handling should be needed for
these. vfio_msihandler() triggers eventfd_signal() without doing any
device specific config access and when user receives this signal then
user tries to perform any config access or IOCTL, then the device will
be moved to D0 state first before servicing any request.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 35 +++++++++++++++++++++++++++----
drivers/vfio/pci/vfio_pci_intrs.c | 6 +++++-
include/linux/vfio_pci_core.h | 3 ++-
3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index aee5e0cd6137..05a68ca9d9e7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -262,16 +262,43 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
}

#ifdef CONFIG_PM
+static int vfio_pci_core_runtime_suspend(struct device *dev)
+{
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(dev);
+
+ /*
+ * If INTx is enabled, then mask INTx before going into runtime
+ * suspended state and unmask the same in the runtime resume.
+ * If INTx has already been masked by the user, then
+ * vfio_pci_intx_mask() will return false and in that case, INTx
+ * should not be unmasked in the runtime resume.
+ */
+ vdev->pm_intx_masked = (is_intx(vdev) && vfio_pci_intx_mask(vdev));
+
+ return 0;
+}
+
+static int vfio_pci_core_runtime_resume(struct device *dev)
+{
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(dev);
+
+ if (vdev->pm_intx_masked)
+ vfio_pci_intx_unmask(vdev);
+
+ return 0;
+}
+
/*
- * The dev_pm_ops needs to be provided to make pci-driver runtime PM working,
- * so use structure without any callbacks.
- *
* The pci-driver core runtime PM routines always save the device state
* before going into suspended state. If the device is going into low power
* state with only with runtime PM ops, then no explicit handling is needed
* for the devices which have NoSoftRst-.
*/
-static const struct dev_pm_ops vfio_pci_core_pm_ops = { };
+static const struct dev_pm_ops vfio_pci_core_pm_ops = {
+ SET_RUNTIME_PM_OPS(vfio_pci_core_runtime_suspend,
+ vfio_pci_core_runtime_resume,
+ NULL)
+};
#endif

int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 6069a11fb51a..1a37db99df48 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -33,10 +33,12 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
eventfd_signal(vdev->ctx[0].trigger, 1);
}

-void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
+/* Returns true if INTx has been masked by this function. */
+bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
unsigned long flags;
+ bool intx_masked = false;

spin_lock_irqsave(&vdev->irqlock, flags);

@@ -60,9 +62,11 @@ void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
disable_irq_nosync(pdev->irq);

vdev->ctx[0].masked = true;
+ intx_masked = true;
}

spin_unlock_irqrestore(&vdev->irqlock, flags);
+ return intx_masked;
}

/*
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 3c7d65e68340..e84f31e44238 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -125,6 +125,7 @@ struct vfio_pci_core_device {
bool nointx;
bool needs_pm_restore;
bool power_state_d3;
+ bool pm_intx_masked;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
int ioeventfds_nr;
@@ -148,7 +149,7 @@ struct vfio_pci_core_device {
#define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
#define irq_is(vdev, type) (vdev->irq_type == type)

-extern void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
+extern bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
extern void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev);

extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev,
--
2.17.1

2022-04-26 08:46:18

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] vfio/pci: Invalidate mmaps and block the access in D3hot power state

Hi Abhishek,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.18-rc4]
[also build test WARNING on next-20220422]
[cannot apply to awilliam-vfio/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Abhishek-Sahu/vfio-pci-power-management-changes/20220425-173224
base: af2d861d4cd2a4da5137f795ee3509e6f944a25b
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220426/[email protected]/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/intel-lab-lkp/linux/commit/1d48b86a17606c483f200c1734085ab415dbfc3c
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Abhishek-Sahu/vfio-pci-power-management-changes/20220425-173224
git checkout 1d48b86a17606c483f200c1734085ab415dbfc3c
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/vfio/pci/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
>> drivers/vfio/pci/vfio_pci_config.c:703:13: sparse: sparse: restricted pci_power_t degrades to integer
drivers/vfio/pci/vfio_pci_config.c:703:22: sparse: sparse: restricted pci_power_t degrades to integer

vim +703 drivers/vfio/pci/vfio_pci_config.c

694
695 /*
696 * It takes all the required locks to protect the access of power related
697 * variables and then invokes vfio_pci_set_power_state().
698 */
699 static void
700 vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev,
701 pci_power_t state)
702 {
> 703 if (state >= PCI_D3hot)
704 vfio_pci_zap_and_down_write_memory_lock(vdev);
705 else
706 down_write(&vdev->memory_lock);
707
708 vfio_pci_set_power_state(vdev, state);
709 up_write(&vdev->memory_lock);
710 }
711

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-04-27 10:54:59

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] vfio/pci: Invalidate mmaps and block the access in D3hot power state

On Tue, Apr 26, 2022 at 09:42:45AM +0800, kernel test robot wrote:
> ...

> sparse warnings: (new ones prefixed by >>)
> >> drivers/vfio/pci/vfio_pci_config.c:703:13: sparse: sparse: restricted pci_power_t degrades to integer
> drivers/vfio/pci/vfio_pci_config.c:703:22: sparse: sparse: restricted pci_power_t degrades to integer

I dunno what Alex thinks, but we have several of these warnings in
drivers/pci/. I'd like to get rid of them, but we haven't figured out
a good way yet. So this might be something we just live with for now.

> vim +703 drivers/vfio/pci/vfio_pci_config.c
>
> 694
> 695 /*
> 696 * It takes all the required locks to protect the access of power related
> 697 * variables and then invokes vfio_pci_set_power_state().
> 698 */
> 699 static void
> 700 vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev,
> 701 pci_power_t state)
> 702 {
> > 703 if (state >= PCI_D3hot)
> 704 vfio_pci_zap_and_down_write_memory_lock(vdev);
> 705 else
> 706 down_write(&vdev->memory_lock);
> 707
> 708 vfio_pci_set_power_state(vdev, state);
> 709 up_write(&vdev->memory_lock);
> 710 }
> 711
>
> --
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

2022-05-03 19:15:10

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vfio/pci: Add support for setting driver data inside core layer

On Mon, 25 Apr 2022 14:56:11 +0530
Abhishek Sahu <[email protected]> wrote:

> The vfio driver is divided into two layers: core layer (implemented in
> vfio_pci_core.c) and parent driver (For example, vfio_pci, mlx5_vfio_pci,
> hisi_acc_vfio_pci, etc.). All the parent driver calls dev_set_drvdata()
> and assigns its own structure as driver data. Some of the callback
> functions are implemented in the core layer and these callback functions
> provide the reference of 'struct pci_dev' or 'struct device'. Currently,
> we use vfio_device_get_from_dev() which provides reference to the
> vfio_device for a device. But this function follows long path to extract
> the same. There are few cases, where we don't need to go through this
> long path if we get this through drvdata.
>
> This patch moves the setting of drvdata inside the core layer. If we see
> the current implementation of parent driver structure implementation,
> then 'struct vfio_pci_core_device' is a first member so the pointer of
> the parent structure and 'struct vfio_pci_core_device' should be the same.
>
> struct hisi_acc_vf_core_device {
> struct vfio_pci_core_device core_device;
> ...
> };
>
> struct mlx5vf_pci_core_device {
> struct vfio_pci_core_device core_device;
> ...
> };
>
> The vfio_pci.c uses 'struct vfio_pci_core_device' itself.
>
> To support getting the drvdata in both the layers, we can put the
> restriction to make 'struct vfio_pci_core_device' as a first member.
> Also, vfio_pci_core_register_device() has this validation which makes sure
> that this prerequisite is always satisfied.
>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 4 ++--
> drivers/vfio/pci/mlx5/main.c | 3 +--
> drivers/vfio/pci/vfio_pci.c | 4 ++--
> drivers/vfio/pci/vfio_pci_core.c | 24 ++++++++++++++++---
> include/linux/vfio_pci_core.h | 7 +++++-
> 5 files changed, 32 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 767b5d47631a..c76c09302a8f 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -1274,11 +1274,11 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
> &hisi_acc_vfio_pci_ops);
> }
>
> - ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
> + ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device,
> + hisi_acc_vdev);
> if (ret)
> goto out_free;
>
> - dev_set_drvdata(&pdev->dev, hisi_acc_vdev);
> return 0;
>
> out_free:
> diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
> index bbec5d288fee..8689248f66f3 100644
> --- a/drivers/vfio/pci/mlx5/main.c
> +++ b/drivers/vfio/pci/mlx5/main.c
> @@ -614,11 +614,10 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
> }
> }
>
> - ret = vfio_pci_core_register_device(&mvdev->core_device);
> + ret = vfio_pci_core_register_device(&mvdev->core_device, mvdev);
> if (ret)
> goto out_free;
>
> - dev_set_drvdata(&pdev->dev, mvdev);
> return 0;
>
> out_free:
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 2b047469e02f..e0f8027c5cd8 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -151,10 +151,10 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> return -ENOMEM;
> vfio_pci_core_init_device(vdev, pdev, &vfio_pci_ops);
>
> - ret = vfio_pci_core_register_device(vdev);
> + ret = vfio_pci_core_register_device(vdev, vdev);
> if (ret)
> goto out_free;
> - dev_set_drvdata(&pdev->dev, vdev);
> +
> return 0;
>
> out_free:
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1271728a09db..953ac33b2f5f 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1822,9 +1822,11 @@ void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev)
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_uninit_device);
>
> -int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> +int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
> + void *driver_data)
> {
> struct pci_dev *pdev = vdev->pdev;
> + struct device *dev = &pdev->dev;
> int ret;
>
> if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> @@ -1843,6 +1845,17 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> return -EBUSY;
> }
>
> + /*
> + * The 'struct vfio_pci_core_device' should be the first member of the
> + * of the structure referenced by 'driver_data' so that it can be
> + * retrieved with dev_get_drvdata() inside vfio-pci core layer.
> + */
> + if ((struct vfio_pci_core_device *)driver_data != vdev) {
> + pci_warn(pdev, "Invalid driver data\n");
> + return -EINVAL;
> + }

It seems a bit odd to me to add a driver_data arg to the function,
which is actually required to point to the same thing as the existing
function arg. Is this just to codify the requirement? Maybe others
can suggest alternatives.

We also need to collaborate with Jason's patch:

https://lore.kernel.org/all/[email protected]/

(and maybe others)

If we implement a change like proposed here that vfio-pci-core sets
drvdata then we don't need for each variant driver to implement their
own wrapper around err_handler or err_detected as Jason proposes in the
linked patch. Thanks,

Alex

> + dev_set_drvdata(dev, driver_data);
> +
> if (pci_is_root_bus(pdev->bus)) {
> ret = vfio_assign_device_set(&vdev->vdev, vdev);
> } else if (!pci_probe_reset_slot(pdev->slot)) {
> @@ -1856,10 +1869,10 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> }
>
> if (ret)
> - return ret;
> + goto out_drvdata;
> ret = vfio_pci_vf_init(vdev);
> if (ret)
> - return ret;
> + goto out_drvdata;
> ret = vfio_pci_vga_init(vdev);
> if (ret)
> goto out_vf;
> @@ -1890,6 +1903,8 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> vfio_pci_set_power_state(vdev, PCI_D0);
> out_vf:
> vfio_pci_vf_uninit(vdev);
> +out_drvdata:
> + dev_set_drvdata(dev, NULL);
> return ret;
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_register_device);
> @@ -1897,6 +1912,7 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_register_device);
> void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
> {
> struct pci_dev *pdev = vdev->pdev;
> + struct device *dev = &pdev->dev;
>
> vfio_pci_core_sriov_configure(pdev, 0);
>
> @@ -1907,6 +1923,8 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
>
> if (!disable_idle_d3)
> vfio_pci_set_power_state(vdev, PCI_D0);
> +
> + dev_set_drvdata(dev, NULL);
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_unregister_device);
>
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 505b2a74a479..3c7d65e68340 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -225,7 +225,12 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev);
> void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
> struct pci_dev *pdev,
> const struct vfio_device_ops *vfio_pci_ops);
> -int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev);
> +/*
> + * The 'struct vfio_pci_core_device' should be the first member
> + * of the structure referenced by 'driver_data'.
> + */
> +int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
> + void *driver_data);
> void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev);
> void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev);
> int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn);

2022-05-04 12:51:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vfio/pci: Add support for setting driver data inside core layer

On Tue, May 03, 2022 at 11:11:24AM -0600, Alex Williamson wrote:
> > @@ -1843,6 +1845,17 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> > return -EBUSY;
> > }
> >
> > + /*
> > + * The 'struct vfio_pci_core_device' should be the first member of the
> > + * of the structure referenced by 'driver_data' so that it can be
> > + * retrieved with dev_get_drvdata() inside vfio-pci core layer.
> > + */
> > + if ((struct vfio_pci_core_device *)driver_data != vdev) {
> > + pci_warn(pdev, "Invalid driver data\n");
> > + return -EINVAL;
> > + }
>
> It seems a bit odd to me to add a driver_data arg to the function,
> which is actually required to point to the same thing as the existing
> function arg. Is this just to codify the requirement? Maybe others
> can suggest alternatives.
>
> We also need to collaborate with Jason's patch:
>
> https://lore.kernel.org/all/[email protected]/
>
> (and maybe others)
>
> If we implement a change like proposed here that vfio-pci-core sets
> drvdata then we don't need for each variant driver to implement their
> own wrapper around err_handler or err_detected as Jason proposes in the
> linked patch. Thanks,

Oh, I forgot about this series completely.

Yes, we need to pick a method, either drvdata always points at the
core struct, or we wrapper the core functions.

I have an independent version of the above patch that uses the
drvdata, but I chucked it because it was unnecessary for just a couple
of AER functions.

We should probably go back to it though if we are adding more
functions, as the wrapping is a bit repetitive. I'll go and respin
that series then. Abhishek can base on top of it.

My approach was more type-sane though:

commit 12ba94a72d7aa134af8752d6ff78193acdac93ae
Author: Jason Gunthorpe <[email protected]>
Date: Tue Mar 29 16:32:32 2022 -0300

vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdata

Having a consistent pointer in the drvdata will allow the next patch to
make use of the drvdata from some of the core code helpers.

Use a WARN_ON inside vfio_pci_core_unregister_device() to detect drivers
that miss this.

Signed-off-by: Jason Gunthorpe <[email protected]>

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 767b5d47631a49..665691967a030c 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -337,6 +337,14 @@ static int vf_qm_cache_wb(struct hisi_qm *qm)
return 0;
}

+static struct hisi_acc_vf_core_device *hssi_acc_drvdata(struct pci_dev *pdev)
+{
+ struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
+
+ return container_of(core_device, struct hisi_acc_vf_core_device,
+ core_device);
+}
+
static void vf_qm_fun_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev,
struct hisi_qm *qm)
{
@@ -962,7 +970,7 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,

static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);

if (hisi_acc_vdev->core_device.vdev.migration_flags !=
VFIO_MIGRATION_STOP_COPY)
@@ -1278,7 +1286,7 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
if (ret)
goto out_free;

- dev_set_drvdata(&pdev->dev, hisi_acc_vdev);
+ dev_set_drvdata(&pdev->dev, &hisi_acc_vdev->core_device);
return 0;

out_free:
@@ -1289,7 +1297,7 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device

static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);

vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index bbec5d288fee97..3391f965abd9f0 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -39,6 +39,14 @@ struct mlx5vf_pci_core_device {
struct mlx5_vf_migration_file *saving_migf;
};

+static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev)
+{
+ struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
+
+ return container_of(core_device, struct mlx5vf_pci_core_device,
+ core_device);
+}
+
static struct page *
mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf,
unsigned long offset)
@@ -505,7 +513,7 @@ static int mlx5vf_pci_get_device_state(struct vfio_device *vdev,

static void mlx5vf_pci_aer_reset_done(struct pci_dev *pdev)
{
- struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
+ struct mlx5vf_pci_core_device *mvdev = mlx5vf_drvdata(pdev);

if (!mvdev->migrate_cap)
return;
@@ -618,7 +626,7 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
if (ret)
goto out_free;

- dev_set_drvdata(&pdev->dev, mvdev);
+ dev_set_drvdata(&pdev->dev, &mvdev->core_device);
return 0;

out_free:
@@ -629,7 +637,7 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,

static void mlx5vf_pci_remove(struct pci_dev *pdev)
{
- struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
+ struct mlx5vf_pci_core_device *mvdev = mlx5vf_drvdata(pdev);

vfio_pci_core_unregister_device(&mvdev->core_device);
vfio_pci_core_uninit_device(&mvdev->core_device);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 06b6f3594a1316..53ad39d617653d 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -262,6 +262,10 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
u16 cmd;
u8 msix_pos;

+ /* Drivers must set the vfio_pci_core_device to their drvdata */
+ if (WARN_ON(vdev != dev_get_drvdata(&vdev->pdev->dev)))
+ return -EINVAL;
+
vfio_pci_set_power_state(vdev, PCI_D0);

/* Don't allow our initial saved state to include busmaster */

2022-05-04 17:42:37

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] vfio/pci: Add support for setting driver data inside core layer

On 5/4/2022 5:50 AM, Jason Gunthorpe wrote:
> On Tue, May 03, 2022 at 11:11:24AM -0600, Alex Williamson wrote:
>>> @@ -1843,6 +1845,17 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
>>> return -EBUSY;
>>> }
>>>
>>> + /*
>>> + * The 'struct vfio_pci_core_device' should be the first member of the
>>> + * of the structure referenced by 'driver_data' so that it can be
>>> + * retrieved with dev_get_drvdata() inside vfio-pci core layer.
>>> + */
>>> + if ((struct vfio_pci_core_device *)driver_data != vdev) {
>>> + pci_warn(pdev, "Invalid driver data\n");
>>> + return -EINVAL;
>>> + }
>>
>> It seems a bit odd to me to add a driver_data arg to the function,
>> which is actually required to point to the same thing as the existing
>> function arg. Is this just to codify the requirement? Maybe others
>> can suggest alternatives.

Yes. It was mainly for enforcing this requirement, otherwise in future
if someone tries to add new driver (or done changes in the existing
structure) and does not follow this convention then the pointer will
be wrong.

>>
>> We also need to collaborate with Jason's patch:
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> (and maybe others)
>>
>> If we implement a change like proposed here that vfio-pci-core sets
>> drvdata then we don't need for each variant driver to implement their
>> own wrapper around err_handler or err_detected as Jason proposes in the
>> linked patch. Thanks,
>
> Oh, I forgot about this series completely.
>
> Yes, we need to pick a method, either drvdata always points at the
> core struct, or we wrapper the core functions.
>
> I have an independent version of the above patch that uses the
> drvdata, but I chucked it because it was unnecessary for just a couple
> of AER functions.
>
> We should probably go back to it though if we are adding more
> functions, as the wrapping is a bit repetitive. I'll go and respin
> that series then. Abhishek can base on top of it.
>

Sure. I will rebase on top of Jason patch series.

> My approach was more type-sane though:
>
This is also fine.

Initially I wanted to do the same but it requires to have a new
wrapper function for each driver so I implemented in the core layer.

Thanks,
Abhishek

> commit 12ba94a72d7aa134af8752d6ff78193acdac93ae
> Author: Jason Gunthorpe <[email protected]>
> Date: Tue Mar 29 16:32:32 2022 -0300
>
> vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdata
>
> Having a consistent pointer in the drvdata will allow the next patch to
> make use of the drvdata from some of the core code helpers.
>
> Use a WARN_ON inside vfio_pci_core_unregister_device() to detect drivers
> that miss this.
>
> Signed-off-by: Jason Gunthorpe <[email protected]>
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 767b5d47631a49..665691967a030c 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -337,6 +337,14 @@ static int vf_qm_cache_wb(struct hisi_qm *qm)
> return 0;
> }
>
> +static struct hisi_acc_vf_core_device *hssi_acc_drvdata(struct pci_dev *pdev)
> +{
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
> +
> + return container_of(core_device, struct hisi_acc_vf_core_device,
> + core_device);
> +}
> +
> static void vf_qm_fun_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> struct hisi_qm *qm)
> {
> @@ -962,7 +970,7 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
>
> static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
> {
> - struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);
>
> if (hisi_acc_vdev->core_device.vdev.migration_flags !=
> VFIO_MIGRATION_STOP_COPY)
> @@ -1278,7 +1286,7 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
> if (ret)
> goto out_free;
>
> - dev_set_drvdata(&pdev->dev, hisi_acc_vdev);
> + dev_set_drvdata(&pdev->dev, &hisi_acc_vdev->core_device);
> return 0;
>
> out_free:
> @@ -1289,7 +1297,7 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
>
> static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
> {
> - struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hssi_acc_drvdata(pdev);
>
> vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
> vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
> diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
> index bbec5d288fee97..3391f965abd9f0 100644
> --- a/drivers/vfio/pci/mlx5/main.c
> +++ b/drivers/vfio/pci/mlx5/main.c
> @@ -39,6 +39,14 @@ struct mlx5vf_pci_core_device {
> struct mlx5_vf_migration_file *saving_migf;
> };
>
> +static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev)
> +{
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
> +
> + return container_of(core_device, struct mlx5vf_pci_core_device,
> + core_device);
> +}
> +
> static struct page *
> mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf,
> unsigned long offset)
> @@ -505,7 +513,7 @@ static int mlx5vf_pci_get_device_state(struct vfio_device *vdev,
>
> static void mlx5vf_pci_aer_reset_done(struct pci_dev *pdev)
> {
> - struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
> + struct mlx5vf_pci_core_device *mvdev = mlx5vf_drvdata(pdev);
>
> if (!mvdev->migrate_cap)
> return;
> @@ -618,7 +626,7 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
> if (ret)
> goto out_free;
>
> - dev_set_drvdata(&pdev->dev, mvdev);
> + dev_set_drvdata(&pdev->dev, &mvdev->core_device);
> return 0;
>
> out_free:
> @@ -629,7 +637,7 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev,
>
> static void mlx5vf_pci_remove(struct pci_dev *pdev)
> {
> - struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
> + struct mlx5vf_pci_core_device *mvdev = mlx5vf_drvdata(pdev);
>
> vfio_pci_core_unregister_device(&mvdev->core_device);
> vfio_pci_core_uninit_device(&mvdev->core_device);
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 06b6f3594a1316..53ad39d617653d 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -262,6 +262,10 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
> u16 cmd;
> u8 msix_pos;
>
> + /* Drivers must set the vfio_pci_core_device to their drvdata */
> + if (WARN_ON(vdev != dev_get_drvdata(&vdev->pdev->dev)))
> + return -EINVAL;
> +
> vfio_pci_set_power_state(vdev, PCI_D0);
>
> /* Don't allow our initial saved state to include busmaster */