2021-06-03 16:09:47

by Max Gurtovoy

[permalink] [raw]
Subject: [RFC PATCH v4 00/11] Introduce vfio-pci-core subsystem

Hi Alex, Cornelia, Jason and Co,

This series split the vfio_pci driver into 2 parts: pci drivers and a
subsystem driver that will also be library of code. The main pci driver,
vfio_pci.ko will be used as before and it will bind to the subsystem
driver vfio_pci_core.ko to register to the VFIO subsystem.

This series is coming to solve some of the issues that were raised in
the previous attempts for extending vfio-pci for vendor specific
functionality:
1. https://lkml.org/lkml/2020/5/17/376 by Yan Zhao.
2. https://www.spinics.net/lists/kernel/msg3903996.html by Longfang Liu

This subsystem framework will also ease on adding new vendor specific
functionality to VFIO devices in the future by allowing another module
to provide the pci_driver that can setup number of details before
registering to VFIO subsystem (such as inject its own operations).

This series also extends the "driver_override" mechanism. We added a flag
for pci drivers that will declare themselves as "driver_override" capable
and only declared drivers can use this mechanism in the PCI subsystem.
Other drivers will not be able to bind to devices that use "driver_override".
Also, the PCI driver matching will always look for ID table and will never
generate dummy "match_all" ID table in the PCI subsystem layer. In this
way, we ensure deterministic behaviour with no races with the original
pci drivers. In order to get the best match for "driver_override" drivers,
one can create a userspace program (example can be found at
https://github.com/maxgurtovoy/linux_tools/blob/main/vfio/bind_vfio_pci_driver.py)
that find the 'best match' according to simple algorithm: "the driver
with the fewest '*' matches wins."
For example, the vfio-pci driver will match to any pci device. So it
will have the maximal '*' matches (for all matching IDs: vendor, device,
subvendor, ...).
In case we are looking for a match to mlx5 based device, we'll have a
match to vfio-pci.ko and mlx5-vfio-pci.ko. We'll prefer mlx5-vfio-pci.ko
since it will have less '*' matches (probably vendor and device IDs will
match). This will work in the future for NVMe/Virtio devices that can
match according to a class code or other criteria.

The main goal of this series is to agree on the vfio_pci module split and the
"driver_override" extensions. The follow-up version will include an extended
mlx5_vfio_pci driver that will support VF suspend/resume as well.

This series applied cleanly on top of vfio reflck re-design (still haven't sent
for review) and can be found at:
https://github.com/Mellanox/NVMEoF-P2P/tree/vfio-v4-external.

Max Gurtovoy (11):
vfio-pci: rename vfio_pci.c to vfio_pci_core.c
vfio-pci: rename vfio_pci_private.h to vfio_pci_core.h
vfio-pci: rename vfio_pci_device to vfio_pci_core_device
vfio-pci: rename ops functions to fit core namings
vfio-pci: include vfio header in vfio_pci_core.h
vfio-pci: introduce vfio_pci.c
vfio-pci: move igd initialization to vfio_pci.c
PCI: add flags field to pci_device_id structure
PCI: add matching checks for driver_override binding
vfio-pci: introduce vfio_pci_core subsystem driver
mlx5-vfio-pci: add new vfio_pci driver for mlx5 devices

Documentation/ABI/testing/sysfs-bus-pci | 6 +-
Documentation/PCI/pci.rst | 1 +
drivers/pci/pci-driver.c | 22 +-
drivers/vfio/pci/Kconfig | 27 +-
drivers/vfio/pci/Makefile | 12 +-
drivers/vfio/pci/mlx5_vfio_pci.c | 130 +
drivers/vfio/pci/vfio_pci.c | 2329 +----------------
drivers/vfio/pci/vfio_pci_config.c | 70 +-
drivers/vfio/pci/vfio_pci_core.c | 2239 ++++++++++++++++
drivers/vfio/pci/vfio_pci_igd.c | 16 +-
drivers/vfio/pci/vfio_pci_intrs.c | 42 +-
drivers/vfio/pci/vfio_pci_rdwr.c | 18 +-
drivers/vfio/pci/vfio_pci_zdev.c | 4 +-
include/linux/mod_devicetable.h | 9 +
include/linux/pci.h | 27 +
.../linux/vfio_pci_core.h | 93 +-
scripts/mod/devicetable-offsets.c | 1 +
scripts/mod/file2alias.c | 8 +-
18 files changed, 2695 insertions(+), 2359 deletions(-)
create mode 100644 drivers/vfio/pci/mlx5_vfio_pci.c
create mode 100644 drivers/vfio/pci/vfio_pci_core.c
rename drivers/vfio/pci/vfio_pci_private.h => include/linux/vfio_pci_core.h (56%)

--
2.21.0


2021-06-03 16:10:23

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 04/11] vfio-pci: rename ops functions to fit core namings

This is another preparation patch for separating the vfio_pci driver to
a subsystem driver and a generic pci driver. This patch doesn't change
any logic.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 36 ++++++++++++++++----------------
1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a6a9d1aaa185..2d2fa64cc8a0 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -533,7 +533,7 @@ static void vfio_pci_vf_token_user_add(struct vfio_pci_core_device *vdev, int va
vfio_device_put(&pf_vdev->vdev);
}

-static void vfio_pci_release(struct vfio_device *core_vdev)
+static void vfio_pci_core_release(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -556,7 +556,7 @@ static void vfio_pci_release(struct vfio_device *core_vdev)
mutex_unlock(&vdev->igate);
}

-static int vfio_pci_open(struct vfio_device *core_vdev)
+static int vfio_pci_core_open(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -763,7 +763,7 @@ struct vfio_devices {
int max_index;
};

-static long vfio_pci_ioctl(struct vfio_device *core_vdev,
+static long vfio_pci_core_ioctl(struct vfio_device *core_vdev,
unsigned int cmd, unsigned long arg)
{
struct vfio_pci_core_device *vdev =
@@ -1394,7 +1394,7 @@ static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
return -EINVAL;
}

-static ssize_t vfio_pci_read(struct vfio_device *core_vdev, char __user *buf,
+static ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
size_t count, loff_t *ppos)
{
struct vfio_pci_core_device *vdev =
@@ -1406,7 +1406,7 @@ static ssize_t vfio_pci_read(struct vfio_device *core_vdev, char __user *buf,
return vfio_pci_rw(vdev, buf, count, ppos, false);
}

-static ssize_t vfio_pci_write(struct vfio_device *core_vdev, const char __user *buf,
+static ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
size_t count, loff_t *ppos)
{
struct vfio_pci_core_device *vdev =
@@ -1612,7 +1612,7 @@ static const struct vm_operations_struct vfio_pci_mmap_ops = {
.fault = vfio_pci_mmap_fault,
};

-static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
+static int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1683,7 +1683,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
return 0;
}

-static void vfio_pci_request(struct vfio_device *core_vdev, unsigned int count)
+static void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1799,7 +1799,7 @@ static int vfio_pci_validate_vf_token(struct vfio_pci_core_device *vdev,

#define VF_TOKEN_ARG "vf_token="

-static int vfio_pci_match(struct vfio_device *core_vdev, char *buf)
+static int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1874,7 +1874,7 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
return 0;
}

-static int vfio_pci_reflck_attach(struct vfio_device *core_vdev)
+static int vfio_pci_core_reflck_attach(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1890,15 +1890,15 @@ static int vfio_pci_reflck_attach(struct vfio_device *core_vdev)

static const struct vfio_device_ops vfio_pci_ops = {
.name = "vfio-pci",
- .open = vfio_pci_open,
- .release = vfio_pci_release,
- .ioctl = vfio_pci_ioctl,
- .read = vfio_pci_read,
- .write = vfio_pci_write,
- .mmap = vfio_pci_mmap,
- .request = vfio_pci_request,
- .match = vfio_pci_match,
- .reflck_attach = vfio_pci_reflck_attach,
+ .open = vfio_pci_core_open,
+ .release = vfio_pci_core_release,
+ .ioctl = vfio_pci_core_ioctl,
+ .read = vfio_pci_core_read,
+ .write = vfio_pci_core_write,
+ .mmap = vfio_pci_core_mmap,
+ .request = vfio_pci_core_request,
+ .match = vfio_pci_core_match,
+ .reflck_attach = vfio_pci_core_reflck_attach,
};

static int vfio_pci_bus_notifier(struct notifier_block *nb,
--
2.21.0

2021-06-03 16:10:34

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 01/11] vfio-pci: rename vfio_pci.c to vfio_pci_core.c

This is a preparation patch for separating the vfio_pci driver to a
subsystem driver and a generic pci driver. This patch doesn't change any
logic.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/Makefile | 2 +-
drivers/vfio/pci/{vfio_pci.c => vfio_pci_core.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename drivers/vfio/pci/{vfio_pci.c => vfio_pci_core.c} (100%)

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 3ff42093962f..66a40488e967 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only

-vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
vfio-pci-$(CONFIG_S390) += vfio_pci_zdev.o

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci_core.c
similarity index 100%
rename from drivers/vfio/pci/vfio_pci.c
rename to drivers/vfio/pci/vfio_pci_core.c
--
2.21.0

2021-06-03 16:10:40

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 02/11] vfio-pci: rename vfio_pci_private.h to vfio_pci_core.h

This is a preparation patch for separating the vfio_pci driver to a
subsystem driver and a generic pci driver. This patch doesn't change
any logic.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/pci/vfio_pci_core.c | 2 +-
drivers/vfio/pci/{vfio_pci_private.h => vfio_pci_core.h} | 6 +++---
drivers/vfio/pci/vfio_pci_igd.c | 2 +-
drivers/vfio/pci/vfio_pci_intrs.c | 2 +-
drivers/vfio/pci/vfio_pci_rdwr.c | 2 +-
drivers/vfio/pci/vfio_pci_zdev.c | 2 +-
7 files changed, 9 insertions(+), 9 deletions(-)
rename drivers/vfio/pci/{vfio_pci_private.h => vfio_pci_core.h} (98%)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 70e28efbc51f..0bc269c0b03f 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -26,7 +26,7 @@
#include <linux/vfio.h>
#include <linux/slab.h>

-#include "vfio_pci_private.h"
+#include "vfio_pci_core.h"

/* Fake capability ID for standard config space */
#define PCI_CAP_ID_BASIC 0
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b87fefc475c7..dc99a546ffe5 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -28,7 +28,7 @@
#include <linux/nospec.h>
#include <linux/sched/mm.h>

-#include "vfio_pci_private.h"
+#include "vfio_pci_core.h"

#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_core.h
similarity index 98%
rename from drivers/vfio/pci/vfio_pci_private.h
rename to drivers/vfio/pci/vfio_pci_core.h
index ae83c2eada3a..b73abba881e9 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_core.h
@@ -15,8 +15,8 @@
#include <linux/uuid.h>
#include <linux/notifier.h>

-#ifndef VFIO_PCI_PRIVATE_H
-#define VFIO_PCI_PRIVATE_H
+#ifndef VFIO_PCI_CORE_H
+#define VFIO_PCI_CORE_H

#define VFIO_PCI_OFFSET_SHIFT 40

@@ -205,4 +205,4 @@ static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
}
#endif

-#endif /* VFIO_PCI_PRIVATE_H */
+#endif /* VFIO_PCI_CORE_H */
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 228df565e9bc..214b6d629b21 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -15,7 +15,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>

-#include "vfio_pci_private.h"
+#include "vfio_pci_core.h"

#define OPREGION_SIGNATURE "IntelGraphicsMem"
#define OPREGION_SIZE (8 * 1024)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 869dce5f134d..df1e8c8c274c 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -20,7 +20,7 @@
#include <linux/wait.h>
#include <linux/slab.h>

-#include "vfio_pci_private.h"
+#include "vfio_pci_core.h"

/*
* INTx
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index a0b5fc8e46f4..667e82726e75 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -17,7 +17,7 @@
#include <linux/vfio.h>
#include <linux/vgaarb.h>

-#include "vfio_pci_private.h"
+#include "vfio_pci_core.h"

#ifdef __LITTLE_ENDIAN
#define vfio_ioread64 ioread64
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 7b011b62c766..ecae0c3d95a0 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -19,7 +19,7 @@
#include <asm/pci_clp.h>
#include <asm/pci_io.h>

-#include "vfio_pci_private.h"
+#include "vfio_pci_core.h"

/*
* Add the Base PCI Function information to the device info region.
--
2.21.0

2021-06-03 16:10:44

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 06/11] vfio-pci: introduce vfio_pci.c

Split the vfio_pci driver into two logical parts, the 'struct pci_driver'
(vfio_pci.c) and a library of code (vfio_pci_core.c) that helps creating
a VFIO device on top of a PCI device.

As before vfio_pci.ko continues to present the same interface under
sysfs and this change should have no functional impact.

vfio_pci_core.c exposes an interface that is similar to a typical
Linux subsystem (and in the future it will become a separate module), in
that a pci_driver doing probe() can setup a number of details and then
register and create the VFIO char device.

This is a preparation for allowing another module to provide the
pci_driver and allow that module to customize how VFIO is setup, inject
its own operations, and easily extend vendor specific functionality.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/Makefile | 2 +-
drivers/vfio/pci/vfio_pci.c | 227 ++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 234 +++++--------------------------
drivers/vfio/pci/vfio_pci_core.h | 22 +++
4 files changed, 286 insertions(+), 199 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci.c

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 66a40488e967..8aa517b4b671 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only

-vfio-pci-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-y := vfio_pci.o vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
vfio-pci-$(CONFIG_S390) += vfio_pci_zdev.o

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
new file mode 100644
index 000000000000..23a21ecbc674
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -0,0 +1,227 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021, Mellanox Technologies, Ltd. All rights reserved.
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <[email protected]>
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc. All rights reserved.
+ * Author: Tom Lyon, [email protected]
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+
+#include "vfio_pci_core.h"
+
+#define DRIVER_VERSION "0.2"
+#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
+#define DRIVER_DESC "VFIO PCI - User Level meta-driver"
+
+static char ids[1024] __initdata;
+module_param_string(ids, ids, sizeof(ids), 0);
+MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified");
+
+static bool enable_sriov;
+#ifdef CONFIG_PCI_IOV
+module_param(enable_sriov, bool, 0644);
+MODULE_PARM_DESC(enable_sriov, "Enable support for SR-IOV configuration. Enabling SR-IOV on a PF typically requires support of the userspace PF driver, enabling VFs without such support may result in non-functional VFs or PF.");
+#endif
+
+static bool disable_denylist;
+module_param(disable_denylist, bool, 0444);
+MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
+
+static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
+{
+ switch (pdev->vendor) {
+ case PCI_VENDOR_ID_INTEL:
+ switch (pdev->device) {
+ case PCI_DEVICE_ID_INTEL_QAT_C3XXX:
+ case PCI_DEVICE_ID_INTEL_QAT_C3XXX_VF:
+ case PCI_DEVICE_ID_INTEL_QAT_C62X:
+ case PCI_DEVICE_ID_INTEL_QAT_C62X_VF:
+ case PCI_DEVICE_ID_INTEL_QAT_DH895XCC:
+ case PCI_DEVICE_ID_INTEL_QAT_DH895XCC_VF:
+ return true;
+ default:
+ return false;
+ }
+ }
+
+ return false;
+}
+
+static bool vfio_pci_is_denylisted(struct pci_dev *pdev)
+{
+ if (!vfio_pci_dev_in_denylist(pdev))
+ return false;
+
+ if (disable_denylist) {
+ pci_warn(pdev,
+ "device denylist disabled - allowing device %04x:%04x.\n",
+ pdev->vendor, pdev->device);
+ return false;
+ }
+
+ pci_warn(pdev, "%04x:%04x exists in vfio-pci device denylist, driver probing disallowed.\n",
+ pdev->vendor, pdev->device);
+
+ return true;
+}
+
+static const struct vfio_device_ops vfio_pci_ops = {
+ .name = "vfio-pci",
+ .open = vfio_pci_core_open,
+ .release = vfio_pci_core_release,
+ .ioctl = vfio_pci_core_ioctl,
+ .read = vfio_pci_core_read,
+ .write = vfio_pci_core_write,
+ .mmap = vfio_pci_core_mmap,
+ .request = vfio_pci_core_request,
+ .match = vfio_pci_core_match,
+ .reflck_attach = vfio_pci_core_reflck_attach,
+};
+
+static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct vfio_pci_core_device *vdev;
+ int ret;
+
+ if (vfio_pci_is_denylisted(pdev))
+ return -EINVAL;
+
+ vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+ if (!vdev)
+ return -ENOMEM;
+
+ ret = vfio_pci_core_register_device(vdev, pdev, &vfio_pci_ops);
+ if (ret)
+ goto out_free;
+
+ return 0;
+
+out_free:
+ kfree(vdev);
+ return ret;
+}
+
+static void vfio_pci_remove(struct pci_dev *pdev)
+{
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
+
+ vfio_pci_core_unregister_device(vdev);
+ kfree(vdev);
+}
+
+static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
+{
+ might_sleep();
+
+ if (!enable_sriov)
+ return -ENOENT;
+
+ return vfio_pci_core_sriov_configure(pdev, nr_virtfn);
+}
+
+static struct pci_driver vfio_pci_driver = {
+ .name = "vfio-pci",
+ .id_table = NULL, /* only dynamic ids */
+ .probe = vfio_pci_probe,
+ .remove = vfio_pci_remove,
+ .sriov_configure = vfio_pci_sriov_configure,
+ .err_handler = &vfio_pci_core_err_handlers,
+};
+
+static void __exit vfio_pci_cleanup(void)
+{
+ pci_unregister_driver(&vfio_pci_driver);
+ vfio_pci_core_cleanup();
+}
+
+static void __init vfio_pci_fill_ids(void)
+{
+ char *p, *id;
+ int rc;
+
+ /* no ids passed actually */
+ if (ids[0] == '\0')
+ return;
+
+ /* add ids specified in the module parameter */
+ p = ids;
+ while ((id = strsep(&p, ","))) {
+ unsigned int vendor, device, subvendor = PCI_ANY_ID,
+ subdevice = PCI_ANY_ID, class = 0, class_mask = 0;
+ int fields;
+
+ if (!strlen(id))
+ continue;
+
+ fields = sscanf(id, "%x:%x:%x:%x:%x:%x",
+ &vendor, &device, &subvendor, &subdevice,
+ &class, &class_mask);
+
+ if (fields < 2) {
+ pr_warn("invalid id string \"%s\"\n", id);
+ continue;
+ }
+
+ rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
+ subvendor, subdevice, class, class_mask, 0);
+ if (rc)
+ pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n",
+ vendor, device, subvendor, subdevice,
+ class, class_mask, rc);
+ else
+ pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n",
+ vendor, device, subvendor, subdevice,
+ class, class_mask);
+ }
+}
+
+static int __init vfio_pci_init(void)
+{
+ int ret;
+
+ ret = vfio_pci_core_init();
+ if (ret)
+ return ret;
+
+ /* Register and scan for devices */
+ ret = pci_register_driver(&vfio_pci_driver);
+ if (ret)
+ goto out;
+
+ vfio_pci_fill_ids();
+
+ if (disable_denylist)
+ pr_warn("device denylist disabled.\n");
+
+ return 0;
+
+out:
+ vfio_pci_core_cleanup();
+ return ret;
+}
+
+module_init(vfio_pci_init);
+module_exit(vfio_pci_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 61cd785c80e8..12d5392c78cc 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -8,8 +8,6 @@
* Author: Tom Lyon, [email protected]
*/

-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
#include <linux/device.h>
#include <linux/eventfd.h>
#include <linux/file.h>
@@ -29,14 +27,6 @@

#include "vfio_pci_core.h"

-#define DRIVER_VERSION "0.2"
-#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
-#define DRIVER_DESC "VFIO PCI - User Level meta-driver"
-
-static char ids[1024] __initdata;
-module_param_string(ids, ids, sizeof(ids), 0);
-MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified");
-
static bool nointxmask;
module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(nointxmask,
@@ -53,16 +43,6 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(disable_idle_d3,
"Disable using the PCI D3 low power state for idle, unused devices");

-static bool enable_sriov;
-#ifdef CONFIG_PCI_IOV
-module_param(enable_sriov, bool, 0644);
-MODULE_PARM_DESC(enable_sriov, "Enable support for SR-IOV configuration. Enabling SR-IOV on a PF typically requires support of the userspace PF driver, enabling VFs without such support may result in non-functional VFs or PF.");
-#endif
-
-static bool disable_denylist;
-module_param(disable_denylist, bool, 0444);
-MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
-
static inline bool vfio_vga_disabled(void)
{
#ifdef CONFIG_VFIO_PCI_VGA
@@ -72,44 +52,6 @@ static inline bool vfio_vga_disabled(void)
#endif
}

-static bool vfio_pci_dev_in_denylist(struct pci_dev *pdev)
-{
- switch (pdev->vendor) {
- case PCI_VENDOR_ID_INTEL:
- switch (pdev->device) {
- case PCI_DEVICE_ID_INTEL_QAT_C3XXX:
- case PCI_DEVICE_ID_INTEL_QAT_C3XXX_VF:
- case PCI_DEVICE_ID_INTEL_QAT_C62X:
- case PCI_DEVICE_ID_INTEL_QAT_C62X_VF:
- case PCI_DEVICE_ID_INTEL_QAT_DH895XCC:
- case PCI_DEVICE_ID_INTEL_QAT_DH895XCC_VF:
- return true;
- default:
- return false;
- }
- }
-
- return false;
-}
-
-static bool vfio_pci_is_denylisted(struct pci_dev *pdev)
-{
- if (!vfio_pci_dev_in_denylist(pdev))
- return false;
-
- if (disable_denylist) {
- pci_warn(pdev,
- "device denylist disabled - allowing device %04x:%04x.\n",
- pdev->vendor, pdev->device);
- return false;
- }
-
- pci_warn(pdev, "%04x:%04x exists in vfio-pci device denylist, driver probing disallowed.\n",
- pdev->vendor, pdev->device);
-
- return true;
-}
-
/*
* Our VGA arbiter participation is limited since we don't know anything
* about the device itself. However, if the device is the only VGA device
@@ -495,8 +437,6 @@ static void vfio_pci_disable(struct vfio_pci_core_device *vdev)
vfio_pci_set_power_state(vdev, PCI_D3hot);
}

-static struct pci_driver vfio_pci_driver;
-
static struct vfio_pci_core_device *get_pf_vdev(struct vfio_pci_core_device *vdev)
{
struct pci_dev *physfn = pci_physfn(vdev->pdev);
@@ -509,7 +449,7 @@ static struct vfio_pci_core_device *get_pf_vdev(struct vfio_pci_core_device *vde
if (!pf_dev)
return NULL;

- if (pci_dev_driver(physfn) != &vfio_pci_driver) {
+ if (pci_dev_driver(physfn) != pci_dev_driver(vdev->pdev)) {
vfio_device_put(pf_dev);
return NULL;
}
@@ -532,7 +472,7 @@ static void vfio_pci_vf_token_user_add(struct vfio_pci_core_device *vdev, int va
vfio_device_put(&pf_vdev->vdev);
}

-static void vfio_pci_core_release(struct vfio_device *core_vdev)
+void vfio_pci_core_release(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -555,7 +495,7 @@ static void vfio_pci_core_release(struct vfio_device *core_vdev)
mutex_unlock(&vdev->igate);
}

-static int vfio_pci_core_open(struct vfio_device *core_vdev)
+int vfio_pci_core_open(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -762,8 +702,8 @@ struct vfio_devices {
int max_index;
};

-static long vfio_pci_core_ioctl(struct vfio_device *core_vdev,
- unsigned int cmd, unsigned long arg)
+long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+ unsigned long arg)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1393,8 +1333,8 @@ static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
return -EINVAL;
}

-static ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
- size_t count, loff_t *ppos)
+ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+ size_t count, loff_t *ppos)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1405,8 +1345,8 @@ static ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *bu
return vfio_pci_rw(vdev, buf, count, ppos, false);
}

-static ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
- size_t count, loff_t *ppos)
+ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+ size_t count, loff_t *ppos)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1611,7 +1551,7 @@ static const struct vm_operations_struct vfio_pci_mmap_ops = {
.fault = vfio_pci_mmap_fault,
};

-static int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
+int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1682,7 +1622,7 @@ static int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_stru
return 0;
}

-static void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count)
+void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1798,7 +1738,7 @@ static int vfio_pci_validate_vf_token(struct vfio_pci_core_device *vdev,

#define VF_TOKEN_ARG "vf_token="

-static int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf)
+int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1852,12 +1792,14 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
{
struct vfio_reflck **preflck = data;
struct vfio_device *device;
+ struct vfio_pci_core_device *vdev;

device = vfio_device_get_from_dev(&pdev->dev);
if (!device)
return 0;

- if (pci_dev_driver(pdev) != &vfio_pci_driver) {
+ vdev = container_of(device, struct vfio_pci_core_device, vdev);
+ if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
vfio_device_put(device);
return 0;
}
@@ -1873,7 +1815,7 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
return 0;
}

-static int vfio_pci_core_reflck_attach(struct vfio_device *core_vdev)
+int vfio_pci_core_reflck_attach(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
@@ -1887,19 +1829,6 @@ static int vfio_pci_core_reflck_attach(struct vfio_device *core_vdev)
return PTR_ERR_OR_ZERO(core_vdev->reflck);
}

-static const struct vfio_device_ops vfio_pci_ops = {
- .name = "vfio-pci",
- .open = vfio_pci_core_open,
- .release = vfio_pci_core_release,
- .ioctl = vfio_pci_core_ioctl,
- .read = vfio_pci_core_read,
- .write = vfio_pci_core_write,
- .mmap = vfio_pci_core_mmap,
- .request = vfio_pci_core_request,
- .match = vfio_pci_core_match,
- .reflck_attach = vfio_pci_core_reflck_attach,
-};
-
static int vfio_pci_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -1914,12 +1843,12 @@ static int vfio_pci_bus_notifier(struct notifier_block *nb,
pci_info(vdev->pdev, "Captured SR-IOV VF %s driver_override\n",
pci_name(pdev));
pdev->driver_override = kasprintf(GFP_KERNEL, "%s",
- vfio_pci_ops.name);
+ vdev->vdev.ops->name);
} else if (action == BUS_NOTIFY_BOUND_DRIVER &&
pdev->is_virtfn && physfn == vdev->pdev) {
struct pci_driver *drv = pci_dev_driver(pdev);

- if (drv && drv != &vfio_pci_driver)
+ if (drv && drv != pci_dev_driver(vdev->pdev))
pci_warn(vdev->pdev,
"VF %s bound to driver %s while PF bound to vfio-pci\n",
pci_name(pdev), drv->name);
@@ -1990,15 +1919,13 @@ static void vfio_pci_vga_uninit(struct vfio_pci_core_device *vdev)
VGA_RSRC_LEGACY_MEM);
}

-static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
+ struct pci_dev *pdev,
+ const struct vfio_device_ops *vfio_pci_ops)
{
- struct vfio_pci_core_device *vdev;
struct iommu_group *group;
int ret;

- if (vfio_pci_is_denylisted(pdev))
- return -EINVAL;
-
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;

@@ -2019,12 +1946,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (!group)
return -EINVAL;

- vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
- if (!vdev) {
- ret = -ENOMEM;
- goto out_group_put;
- }
-
vdev->pdev = pdev;
vdev->irq_type = VFIO_PCI_NUM_IRQS;
mutex_init(&vdev->igate);
@@ -2036,9 +1957,9 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
INIT_LIST_HEAD(&vdev->vma_list);
init_rwsem(&vdev->memory_lock);

- ret = vfio_init_group_dev(&vdev->vdev, &pdev->dev, &vfio_pci_ops);
+ ret = vfio_init_group_dev(&vdev->vdev, &pdev->dev, vfio_pci_ops);
if (ret)
- goto out_free;
+ goto out_group_put;
ret = vfio_pci_vf_init(vdev);
if (ret)
goto out_uninit;
@@ -2075,17 +1996,15 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
vfio_pci_vf_uninit(vdev);
out_uninit:
vfio_uninit_group_dev(&vdev->vdev);
-out_free:
kfree(vdev->pm_save);
- kfree(vdev);
out_group_put:
vfio_iommu_group_put(group, &pdev->dev);
return ret;
}

-static void vfio_pci_remove(struct pci_dev *pdev)
+void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
{
- struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
+ struct pci_dev *pdev = vdev->pdev;

pci_disable_sriov(pdev);

@@ -2103,7 +2022,6 @@ static void vfio_pci_remove(struct pci_dev *pdev)
mutex_destroy(&vdev->ioeventfds_lock);
kfree(vdev->region);
kfree(vdev->pm_save);
- kfree(vdev);
}

static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
@@ -2130,16 +2048,13 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
}

-static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
+int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
{
struct vfio_device *device;
int ret = 0;

might_sleep();

- if (!enable_sriov)
- return -ENOENT;
-
device = vfio_device_get_from_dev(&pdev->dev);
if (!device)
return -ENODEV;
@@ -2154,19 +2069,10 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
return ret < 0 ? ret : nr_virtfn;
}

-static const struct pci_error_handlers vfio_err_handlers = {
+const struct pci_error_handlers vfio_pci_core_err_handlers = {
.error_detected = vfio_pci_aer_err_detected,
};

-static struct pci_driver vfio_pci_driver = {
- .name = "vfio-pci",
- .id_table = NULL, /* only dynamic ids */
- .probe = vfio_pci_probe,
- .remove = vfio_pci_remove,
- .sriov_configure = vfio_pci_sriov_configure,
- .err_handler = &vfio_err_handlers,
-};
-
static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
{
struct vfio_devices *devs = data;
@@ -2180,12 +2086,12 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
if (!device)
return -EINVAL;

- if (pci_dev_driver(pdev) != &vfio_pci_driver) {
+ vdev = container_of(device, struct vfio_pci_core_device, vdev);
+ if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
vfio_device_put(device);
return -EBUSY;
}

- vdev = container_of(device, struct vfio_pci_core_device, vdev);

/* Fault if the device is not unused */
if (device->refcnt) {
@@ -2210,12 +2116,12 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
if (!device)
return -EINVAL;

- if (pci_dev_driver(pdev) != &vfio_pci_driver) {
+ vdev = container_of(device, struct vfio_pci_core_device, vdev);
+ if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) {
vfio_device_put(device);
return -EBUSY;
}

- vdev = container_of(device, struct vfio_pci_core_device, vdev);

/*
* Locking multiple devices is prone to deadlock, runaway and
@@ -2305,83 +2211,15 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_core_device *vdev)
kfree(devs.devices);
}

-static void __exit vfio_pci_cleanup(void)
+/* This will become the __exit function of vfio_pci_core.ko */
+void vfio_pci_core_cleanup(void)
{
- pci_unregister_driver(&vfio_pci_driver);
vfio_pci_uninit_perm_bits();
}

-static void __init vfio_pci_fill_ids(void)
+/* This will become the __init function of vfio_pci_core.ko */
+int __init vfio_pci_core_init(void)
{
- char *p, *id;
- int rc;
-
- /* no ids passed actually */
- if (ids[0] == '\0')
- return;
-
- /* add ids specified in the module parameter */
- p = ids;
- while ((id = strsep(&p, ","))) {
- unsigned int vendor, device, subvendor = PCI_ANY_ID,
- subdevice = PCI_ANY_ID, class = 0, class_mask = 0;
- int fields;
-
- if (!strlen(id))
- continue;
-
- fields = sscanf(id, "%x:%x:%x:%x:%x:%x",
- &vendor, &device, &subvendor, &subdevice,
- &class, &class_mask);
-
- if (fields < 2) {
- pr_warn("invalid id string \"%s\"\n", id);
- continue;
- }
-
- rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
- subvendor, subdevice, class, class_mask, 0);
- if (rc)
- pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n",
- vendor, device, subvendor, subdevice,
- class, class_mask, rc);
- else
- pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n",
- vendor, device, subvendor, subdevice,
- class, class_mask);
- }
-}
-
-static int __init vfio_pci_init(void)
-{
- int ret;
-
/* Allocate shared config space permission data used by all devices */
- ret = vfio_pci_init_perm_bits();
- if (ret)
- return ret;
-
- /* Register and scan for devices */
- ret = pci_register_driver(&vfio_pci_driver);
- if (ret)
- goto out_driver;
-
- vfio_pci_fill_ids();
-
- if (disable_denylist)
- pr_warn("device denylist disabled.\n");
-
- return 0;
-
-out_driver:
- vfio_pci_uninit_perm_bits();
- return ret;
+ return vfio_pci_init_perm_bits();
}
-
-module_init(vfio_pci_init);
-module_exit(vfio_pci_cleanup);
-
-MODULE_VERSION(DRIVER_VERSION);
-MODULE_LICENSE("GPL v2");
-MODULE_AUTHOR(DRIVER_AUTHOR);
-MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/pci/vfio_pci_core.h b/drivers/vfio/pci/vfio_pci_core.h
index cb24d229df66..245862d5d6e4 100644
--- a/drivers/vfio/pci/vfio_pci_core.h
+++ b/drivers/vfio/pci/vfio_pci_core.h
@@ -206,4 +206,26 @@ static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
}
#endif

+/* Will be exported for vfio pci drivers usage */
+void vfio_pci_core_cleanup(void);
+int vfio_pci_core_init(void);
+void vfio_pci_core_release(struct vfio_device *core_vdev);
+int vfio_pci_core_open(struct vfio_device *core_vdev);
+int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
+ struct pci_dev *pdev,
+ const struct vfio_device_ops *vfio_pci_ops);
+void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn);
+extern const struct pci_error_handlers vfio_pci_core_err_handlers;
+long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+ unsigned long arg);
+ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
+ size_t count, loff_t *ppos);
+ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
+ size_t count, loff_t *ppos);
+int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma);
+void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count);
+int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
+int vfio_pci_core_reflck_attach(struct vfio_device *core_vdev);
+
#endif /* VFIO_PCI_CORE_H */
--
2.21.0

2021-06-03 16:10:55

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 03/11] vfio-pci: rename vfio_pci_device to vfio_pci_core_device

This is a preparation patch for separating the vfio_pci driver to a
subsystem driver and a generic pci driver. This patch doesn't change
any logic. The new vfio_pci_core_device structure will be the main
structure of the core driver and later on vfio_pci_device structure will
be the main structure of the generic vfio_pci driver.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/vfio_pci_config.c | 68 ++++++++--------
drivers/vfio/pci/vfio_pci_core.c | 124 ++++++++++++++---------------
drivers/vfio/pci/vfio_pci_core.h | 52 ++++++------
drivers/vfio/pci/vfio_pci_igd.c | 14 ++--
drivers/vfio/pci/vfio_pci_intrs.c | 40 +++++-----
drivers/vfio/pci/vfio_pci_rdwr.c | 16 ++--
drivers/vfio/pci/vfio_pci_zdev.c | 2 +-
7 files changed, 158 insertions(+), 158 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 0bc269c0b03f..1f034f768a27 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -108,9 +108,9 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = {
struct perm_bits {
u8 *virt; /* read/write virtual data, not hw */
u8 *write; /* writeable bits */
- int (*readfn)(struct vfio_pci_device *vdev, int pos, int count,
+ int (*readfn)(struct vfio_pci_core_device *vdev, int pos, int count,
struct perm_bits *perm, int offset, __le32 *val);
- int (*writefn)(struct vfio_pci_device *vdev, int pos, int count,
+ int (*writefn)(struct vfio_pci_core_device *vdev, int pos, int count,
struct perm_bits *perm, int offset, __le32 val);
};

@@ -171,7 +171,7 @@ static int vfio_user_config_write(struct pci_dev *pdev, int offset,
return ret;
}

-static int vfio_default_config_read(struct vfio_pci_device *vdev, int pos,
+static int vfio_default_config_read(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 *val)
{
@@ -197,7 +197,7 @@ static int vfio_default_config_read(struct vfio_pci_device *vdev, int pos,
return count;
}

-static int vfio_default_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_default_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -244,7 +244,7 @@ static int vfio_default_config_write(struct vfio_pci_device *vdev, int pos,
}

/* Allow direct read from hardware, except for capability next pointer */
-static int vfio_direct_config_read(struct vfio_pci_device *vdev, int pos,
+static int vfio_direct_config_read(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 *val)
{
@@ -269,7 +269,7 @@ static int vfio_direct_config_read(struct vfio_pci_device *vdev, int pos,
}

/* Raw access skips any kind of virtualization */
-static int vfio_raw_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_raw_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -282,7 +282,7 @@ static int vfio_raw_config_write(struct vfio_pci_device *vdev, int pos,
return count;
}

-static int vfio_raw_config_read(struct vfio_pci_device *vdev, int pos,
+static int vfio_raw_config_read(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 *val)
{
@@ -296,7 +296,7 @@ static int vfio_raw_config_read(struct vfio_pci_device *vdev, int pos,
}

/* Virt access uses only virtualization */
-static int vfio_virt_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_virt_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -304,7 +304,7 @@ static int vfio_virt_config_write(struct vfio_pci_device *vdev, int pos,
return count;
}

-static int vfio_virt_config_read(struct vfio_pci_device *vdev, int pos,
+static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 *val)
{
@@ -396,7 +396,7 @@ static inline void p_setd(struct perm_bits *p, int off, u32 virt, u32 write)
}

/* Caller should hold memory_lock semaphore */
-bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev)
+bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]);
@@ -413,7 +413,7 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev)
* Restore the *real* BARs after we detect a FLR or backdoor reset.
* (backdoor = some device specific technique that we didn't catch)
*/
-static void vfio_bar_restore(struct vfio_pci_device *vdev)
+static void vfio_bar_restore(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
u32 *rbar = vdev->rbar;
@@ -460,7 +460,7 @@ static __le32 vfio_generate_bar_flags(struct pci_dev *pdev, int bar)
* Pretend we're hardware and tweak the values of the *virtual* PCI BARs
* to reflect the hardware capabilities. This implements BAR sizing.
*/
-static void vfio_bar_fixup(struct vfio_pci_device *vdev)
+static void vfio_bar_fixup(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
int i;
@@ -514,7 +514,7 @@ static void vfio_bar_fixup(struct vfio_pci_device *vdev)
vdev->bardirty = false;
}

-static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos,
+static int vfio_basic_config_read(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 *val)
{
@@ -536,7 +536,7 @@ static int vfio_basic_config_read(struct vfio_pci_device *vdev, int pos,
}

/* Test whether BARs match the value we think they should contain */
-static bool vfio_need_bar_restore(struct vfio_pci_device *vdev)
+static bool vfio_need_bar_restore(struct vfio_pci_core_device *vdev)
{
int i = 0, pos = PCI_BASE_ADDRESS_0, ret;
u32 bar;
@@ -552,7 +552,7 @@ static bool vfio_need_bar_restore(struct vfio_pci_device *vdev)
return false;
}

-static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -692,7 +692,7 @@ static int __init init_pci_cap_basic_perm(struct perm_bits *perm)
return 0;
}

-static int vfio_pm_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_pm_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -747,7 +747,7 @@ static int __init init_pci_cap_pm_perm(struct perm_bits *perm)
return 0;
}

-static int vfio_vpd_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_vpd_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -829,7 +829,7 @@ static int __init init_pci_cap_pcix_perm(struct perm_bits *perm)
return 0;
}

-static int vfio_exp_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_exp_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -913,7 +913,7 @@ static int __init init_pci_cap_exp_perm(struct perm_bits *perm)
return 0;
}

-static int vfio_af_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_af_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -1072,7 +1072,7 @@ int __init vfio_pci_init_perm_bits(void)
return ret;
}

-static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
+static int vfio_find_cap_start(struct vfio_pci_core_device *vdev, int pos)
{
u8 cap;
int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
@@ -1089,7 +1089,7 @@ static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
return pos;
}

-static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
+static int vfio_msi_config_read(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 *val)
{
@@ -1109,7 +1109,7 @@ static int vfio_msi_config_read(struct vfio_pci_device *vdev, int pos,
return vfio_default_config_read(vdev, pos, count, perm, offset, val);
}

-static int vfio_msi_config_write(struct vfio_pci_device *vdev, int pos,
+static int vfio_msi_config_write(struct vfio_pci_core_device *vdev, int pos,
int count, struct perm_bits *perm,
int offset, __le32 val)
{
@@ -1189,7 +1189,7 @@ static int init_pci_cap_msi_perm(struct perm_bits *perm, int len, u16 flags)
}

/* Determine MSI CAP field length; initialize msi_perms on 1st call per vdev */
-static int vfio_msi_cap_len(struct vfio_pci_device *vdev, u8 pos)
+static int vfio_msi_cap_len(struct vfio_pci_core_device *vdev, u8 pos)
{
struct pci_dev *pdev = vdev->pdev;
int len, ret;
@@ -1222,7 +1222,7 @@ static int vfio_msi_cap_len(struct vfio_pci_device *vdev, u8 pos)
}

/* Determine extended capability length for VC (2 & 9) and MFVC */
-static int vfio_vc_cap_len(struct vfio_pci_device *vdev, u16 pos)
+static int vfio_vc_cap_len(struct vfio_pci_core_device *vdev, u16 pos)
{
struct pci_dev *pdev = vdev->pdev;
u32 tmp;
@@ -1263,7 +1263,7 @@ static int vfio_vc_cap_len(struct vfio_pci_device *vdev, u16 pos)
return len;
}

-static int vfio_cap_len(struct vfio_pci_device *vdev, u8 cap, u8 pos)
+static int vfio_cap_len(struct vfio_pci_core_device *vdev, u8 cap, u8 pos)
{
struct pci_dev *pdev = vdev->pdev;
u32 dword;
@@ -1338,7 +1338,7 @@ static int vfio_cap_len(struct vfio_pci_device *vdev, u8 cap, u8 pos)
return 0;
}

-static int vfio_ext_cap_len(struct vfio_pci_device *vdev, u16 ecap, u16 epos)
+static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epos)
{
struct pci_dev *pdev = vdev->pdev;
u8 byte;
@@ -1412,7 +1412,7 @@ static int vfio_ext_cap_len(struct vfio_pci_device *vdev, u16 ecap, u16 epos)
return 0;
}

-static int vfio_fill_vconfig_bytes(struct vfio_pci_device *vdev,
+static int vfio_fill_vconfig_bytes(struct vfio_pci_core_device *vdev,
int offset, int size)
{
struct pci_dev *pdev = vdev->pdev;
@@ -1459,7 +1459,7 @@ static int vfio_fill_vconfig_bytes(struct vfio_pci_device *vdev,
return ret;
}

-static int vfio_cap_init(struct vfio_pci_device *vdev)
+static int vfio_cap_init(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
u8 *map = vdev->pci_config_map;
@@ -1549,7 +1549,7 @@ static int vfio_cap_init(struct vfio_pci_device *vdev)
return 0;
}

-static int vfio_ecap_init(struct vfio_pci_device *vdev)
+static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
u8 *map = vdev->pci_config_map;
@@ -1669,7 +1669,7 @@ static const struct pci_device_id known_bogus_vf_intx_pin[] = {
* for each area requiring emulated bits, but the array of pointers
* would be comparable in size (at least for standard config space).
*/
-int vfio_config_init(struct vfio_pci_device *vdev)
+int vfio_config_init(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
u8 *map, *vconfig;
@@ -1773,7 +1773,7 @@ int vfio_config_init(struct vfio_pci_device *vdev)
return pcibios_err_to_errno(ret);
}

-void vfio_config_free(struct vfio_pci_device *vdev)
+void vfio_config_free(struct vfio_pci_core_device *vdev)
{
kfree(vdev->vconfig);
vdev->vconfig = NULL;
@@ -1790,7 +1790,7 @@ void vfio_config_free(struct vfio_pci_device *vdev)
* Find the remaining number of bytes in a dword that match the given
* position. Stop at either the end of the capability or the dword boundary.
*/
-static size_t vfio_pci_cap_remaining_dword(struct vfio_pci_device *vdev,
+static size_t vfio_pci_cap_remaining_dword(struct vfio_pci_core_device *vdev,
loff_t pos)
{
u8 cap = vdev->pci_config_map[pos];
@@ -1802,7 +1802,7 @@ static size_t vfio_pci_cap_remaining_dword(struct vfio_pci_device *vdev,
return i;
}

-static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
+static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
struct pci_dev *pdev = vdev->pdev;
@@ -1885,7 +1885,7 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
return ret;
}

-ssize_t vfio_pci_config_rw(struct vfio_pci_device *vdev, char __user *buf,
+ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
size_t done = 0;
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index dc99a546ffe5..a6a9d1aaa185 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -121,7 +121,7 @@ static bool vfio_pci_is_denylisted(struct pci_dev *pdev)
*/
static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
{
- struct vfio_pci_device *vdev = opaque;
+ struct vfio_pci_core_device *vdev = opaque;
struct pci_dev *tmp = NULL, *pdev = vdev->pdev;
unsigned char max_busnr;
unsigned int decodes;
@@ -155,7 +155,7 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
}

-static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
+static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
{
struct resource *res;
int i;
@@ -223,8 +223,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
}
}

-static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
-static void vfio_pci_disable(struct vfio_pci_device *vdev);
+static void vfio_pci_try_bus_reset(struct vfio_pci_core_device *vdev);
+static void vfio_pci_disable(struct vfio_pci_core_device *vdev);
static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);

/*
@@ -258,7 +258,7 @@ static bool vfio_pci_nointx(struct pci_dev *pdev)
return false;
}

-static void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
+static void vfio_pci_probe_power_state(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
u16 pmcsr;
@@ -278,7 +278,7 @@ static void vfio_pci_probe_power_state(struct vfio_pci_device *vdev)
* by PM capability emulation and separately from pci_dev internal saved state
* to avoid it being overwritten and consumed around other resets.
*/
-int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
+int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t state)
{
struct pci_dev *pdev = vdev->pdev;
bool needs_restore = false, needs_save = false;
@@ -309,7 +309,7 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state)
return ret;
}

-static int vfio_pci_enable(struct vfio_pci_device *vdev)
+static int vfio_pci_enable(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
int ret;
@@ -397,7 +397,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
return ret;
}

-static void vfio_pci_disable(struct vfio_pci_device *vdev)
+static void vfio_pci_disable(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_dummy_resource *dummy_res, *tmp;
@@ -498,7 +498,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)

static struct pci_driver vfio_pci_driver;

-static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device *vdev)
+static struct vfio_pci_core_device *get_pf_vdev(struct vfio_pci_core_device *vdev)
{
struct pci_dev *physfn = pci_physfn(vdev->pdev);
struct vfio_device *pf_dev;
@@ -515,12 +515,12 @@ static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device *vdev)
return NULL;
}

- return container_of(pf_dev, struct vfio_pci_device, vdev);
+ return container_of(pf_dev, struct vfio_pci_core_device, vdev);
}

-static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)
+static void vfio_pci_vf_token_user_add(struct vfio_pci_core_device *vdev, int val)
{
- struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev);
+ struct vfio_pci_core_device *pf_vdev = get_pf_vdev(vdev);

if (!pf_vdev)
return;
@@ -535,8 +535,8 @@ static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)

static void vfio_pci_release(struct vfio_device *core_vdev)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);

lockdep_assert_held(&core_vdev->reflck->lock);

@@ -558,8 +558,8 @@ static void vfio_pci_release(struct vfio_device *core_vdev)

static int vfio_pci_open(struct vfio_device *core_vdev)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
int ret;

lockdep_assert_held(&core_vdev->reflck->lock);
@@ -573,7 +573,7 @@ static int vfio_pci_open(struct vfio_device *core_vdev)
return ret;
}

-static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
+static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type)
{
if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
u8 pin;
@@ -720,7 +720,7 @@ static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev,
return walk.ret;
}

-static int msix_mmappable_cap(struct vfio_pci_device *vdev,
+static int msix_mmappable_cap(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
{
struct vfio_info_cap_header header = {
@@ -731,7 +731,7 @@ static int msix_mmappable_cap(struct vfio_pci_device *vdev,
return vfio_info_add_capability(caps, &header, sizeof(header));
}

-int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
+int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,
unsigned int type, unsigned int subtype,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data)
@@ -758,7 +758,7 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
}

struct vfio_devices {
- struct vfio_pci_device **devices;
+ struct vfio_pci_core_device **devices;
int cur_index;
int max_index;
};
@@ -766,8 +766,8 @@ struct vfio_devices {
static long vfio_pci_ioctl(struct vfio_device *core_vdev,
unsigned int cmd, unsigned long arg)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
unsigned long minsz;

if (cmd == VFIO_DEVICE_GET_INFO) {
@@ -1247,7 +1247,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
goto hot_reset_release;

for (; mem_idx < devs.cur_index; mem_idx++) {
- struct vfio_pci_device *tmp = devs.devices[mem_idx];
+ struct vfio_pci_core_device *tmp = devs.devices[mem_idx];

ret = down_write_trylock(&tmp->memory_lock);
if (!ret) {
@@ -1262,7 +1262,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,

hot_reset_release:
for (i = 0; i < devs.cur_index; i++) {
- struct vfio_pci_device *tmp = devs.devices[i];
+ struct vfio_pci_core_device *tmp = devs.devices[i];

if (i < mem_idx)
up_write(&tmp->memory_lock);
@@ -1363,7 +1363,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
return -ENOTTY;
}

-static ssize_t vfio_pci_rw(struct vfio_pci_device *vdev, char __user *buf,
+static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
@@ -1397,8 +1397,8 @@ static ssize_t vfio_pci_rw(struct vfio_pci_device *vdev, char __user *buf,
static ssize_t vfio_pci_read(struct vfio_device *core_vdev, char __user *buf,
size_t count, loff_t *ppos)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);

if (!count)
return 0;
@@ -1409,8 +1409,8 @@ static ssize_t vfio_pci_read(struct vfio_device *core_vdev, char __user *buf,
static ssize_t vfio_pci_write(struct vfio_device *core_vdev, const char __user *buf,
size_t count, loff_t *ppos)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);

if (!count)
return 0;
@@ -1419,7 +1419,7 @@ static ssize_t vfio_pci_write(struct vfio_device *core_vdev, const char __user *
}

/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */
-static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
+static int vfio_pci_zap_and_vma_lock(struct vfio_pci_core_device *vdev, bool try)
{
struct vfio_pci_mmap_vma *mmap_vma, *tmp;

@@ -1507,14 +1507,14 @@ static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
}
}

-void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
+void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev)
{
vfio_pci_zap_and_vma_lock(vdev, false);
down_write(&vdev->memory_lock);
mutex_unlock(&vdev->vma_lock);
}

-u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev)
+u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev)
{
u16 cmd;

@@ -1527,14 +1527,14 @@ u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev)
return cmd;
}

-void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd)
+void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, u16 cmd)
{
pci_write_config_word(vdev->pdev, PCI_COMMAND, cmd);
up_write(&vdev->memory_lock);
}

/* Caller holds vma_lock */
-static int __vfio_pci_add_vma(struct vfio_pci_device *vdev,
+static int __vfio_pci_add_vma(struct vfio_pci_core_device *vdev,
struct vm_area_struct *vma)
{
struct vfio_pci_mmap_vma *mmap_vma;
@@ -1560,7 +1560,7 @@ static void vfio_pci_mmap_open(struct vm_area_struct *vma)

static void vfio_pci_mmap_close(struct vm_area_struct *vma)
{
- struct vfio_pci_device *vdev = vma->vm_private_data;
+ struct vfio_pci_core_device *vdev = vma->vm_private_data;
struct vfio_pci_mmap_vma *mmap_vma;

mutex_lock(&vdev->vma_lock);
@@ -1577,7 +1577,7 @@ static void vfio_pci_mmap_close(struct vm_area_struct *vma)
static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
- struct vfio_pci_device *vdev = vma->vm_private_data;
+ struct vfio_pci_core_device *vdev = vma->vm_private_data;
vm_fault_t ret = VM_FAULT_NOPAGE;

mutex_lock(&vdev->vma_lock);
@@ -1614,8 +1614,8 @@ static const struct vm_operations_struct vfio_pci_mmap_ops = {

static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
struct pci_dev *pdev = vdev->pdev;
unsigned int index;
u64 phys_len, req_len, pgoff, req_start;
@@ -1685,8 +1685,8 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v

static void vfio_pci_request(struct vfio_device *core_vdev, unsigned int count)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
struct pci_dev *pdev = vdev->pdev;

mutex_lock(&vdev->igate);
@@ -1705,7 +1705,7 @@ static void vfio_pci_request(struct vfio_device *core_vdev, unsigned int count)
mutex_unlock(&vdev->igate);
}

-static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
+static int vfio_pci_validate_vf_token(struct vfio_pci_core_device *vdev,
bool vf_token, uuid_t *uuid)
{
/*
@@ -1737,7 +1737,7 @@ static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
return 0; /* No VF token provided or required */

if (vdev->pdev->is_virtfn) {
- struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev);
+ struct vfio_pci_core_device *pf_vdev = get_pf_vdev(vdev);
bool match;

if (!pf_vdev) {
@@ -1801,8 +1801,8 @@ static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,

static int vfio_pci_match(struct vfio_device *core_vdev, char *buf)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
bool vf_token = false;
uuid_t uuid;
int ret;
@@ -1876,8 +1876,8 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)

static int vfio_pci_reflck_attach(struct vfio_device *core_vdev)
{
- struct vfio_pci_device *vdev =
- container_of(core_vdev, struct vfio_pci_device, vdev);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
bool slot = !pci_probe_reset_slot(vdev->pdev->slot);

if (pci_is_root_bus(vdev->pdev->bus) ||
@@ -1904,8 +1904,8 @@ static const struct vfio_device_ops vfio_pci_ops = {
static int vfio_pci_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
- struct vfio_pci_device *vdev = container_of(nb,
- struct vfio_pci_device, nb);
+ struct vfio_pci_core_device *vdev = container_of(nb,
+ struct vfio_pci_core_device, nb);
struct device *dev = data;
struct pci_dev *pdev = to_pci_dev(dev);
struct pci_dev *physfn = pci_physfn(pdev);
@@ -1929,7 +1929,7 @@ static int vfio_pci_bus_notifier(struct notifier_block *nb,
return 0;
}

-static int vfio_pci_vf_init(struct vfio_pci_device *vdev)
+static int vfio_pci_vf_init(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
int ret;
@@ -1953,7 +1953,7 @@ static int vfio_pci_vf_init(struct vfio_pci_device *vdev)
return 0;
}

-static void vfio_pci_vf_uninit(struct vfio_pci_device *vdev)
+static void vfio_pci_vf_uninit(struct vfio_pci_core_device *vdev)
{
if (!vdev->vf_token)
return;
@@ -1964,7 +1964,7 @@ static void vfio_pci_vf_uninit(struct vfio_pci_device *vdev)
kfree(vdev->vf_token);
}

-static int vfio_pci_vga_init(struct vfio_pci_device *vdev)
+static int vfio_pci_vga_init(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
int ret;
@@ -1979,7 +1979,7 @@ static int vfio_pci_vga_init(struct vfio_pci_device *vdev)
return 0;
}

-static void vfio_pci_vga_uninit(struct vfio_pci_device *vdev)
+static void vfio_pci_vga_uninit(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;

@@ -1993,7 +1993,7 @@ static void vfio_pci_vga_uninit(struct vfio_pci_device *vdev)

static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
- struct vfio_pci_device *vdev;
+ struct vfio_pci_core_device *vdev;
struct iommu_group *group;
int ret;

@@ -2086,7 +2086,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)

static void vfio_pci_remove(struct pci_dev *pdev)
{
- struct vfio_pci_device *vdev = dev_get_drvdata(&pdev->dev);
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);

pci_disable_sriov(pdev);

@@ -2110,14 +2110,14 @@ static void vfio_pci_remove(struct pci_dev *pdev)
static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state)
{
- struct vfio_pci_device *vdev;
+ struct vfio_pci_core_device *vdev;
struct vfio_device *device;

device = vfio_device_get_from_dev(&pdev->dev);
if (device == NULL)
return PCI_ERS_RESULT_DISCONNECT;

- vdev = container_of(device, struct vfio_pci_device, vdev);
+ vdev = container_of(device, struct vfio_pci_core_device, vdev);

mutex_lock(&vdev->igate);

@@ -2172,7 +2172,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
{
struct vfio_devices *devs = data;
struct vfio_device *device;
- struct vfio_pci_device *vdev;
+ struct vfio_pci_core_device *vdev;

if (devs->cur_index == devs->max_index)
return -ENOSPC;
@@ -2186,7 +2186,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
return -EBUSY;
}

- vdev = container_of(device, struct vfio_pci_device, vdev);
+ vdev = container_of(device, struct vfio_pci_core_device, vdev);

/* Fault if the device is not unused */
if (device->refcnt) {
@@ -2202,7 +2202,7 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
{
struct vfio_devices *devs = data;
struct vfio_device *device;
- struct vfio_pci_device *vdev;
+ struct vfio_pci_core_device *vdev;

if (devs->cur_index == devs->max_index)
return -ENOSPC;
@@ -2216,7 +2216,7 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
return -EBUSY;
}

- vdev = container_of(device, struct vfio_pci_device, vdev);
+ vdev = container_of(device, struct vfio_pci_core_device, vdev);

/*
* Locking multiple devices is prone to deadlock, runaway and
@@ -2247,12 +2247,12 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
* to be bound to vfio_pci since that's the only way we can be sure they
* stay put.
*/
-static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
+static void vfio_pci_try_bus_reset(struct vfio_pci_core_device *vdev)
{
struct vfio_devices devs = { .cur_index = 0 };
int i = 0, ret = -EINVAL;
bool slot = false;
- struct vfio_pci_device *tmp;
+ struct vfio_pci_core_device *tmp;

if (!pci_probe_reset_slot(vdev->pdev->slot))
slot = true;
diff --git a/drivers/vfio/pci/vfio_pci_core.h b/drivers/vfio/pci/vfio_pci_core.h
index b73abba881e9..9aa558672b83 100644
--- a/drivers/vfio/pci/vfio_pci_core.h
+++ b/drivers/vfio/pci/vfio_pci_core.h
@@ -33,7 +33,7 @@

struct vfio_pci_ioeventfd {
struct list_head next;
- struct vfio_pci_device *vdev;
+ struct vfio_pci_core_device *vdev;
struct virqfd *virqfd;
void __iomem *addr;
uint64_t data;
@@ -52,18 +52,18 @@ struct vfio_pci_irq_ctx {
struct irq_bypass_producer producer;
};

-struct vfio_pci_device;
+struct vfio_pci_core_device;
struct vfio_pci_region;

struct vfio_pci_regops {
- size_t (*rw)(struct vfio_pci_device *vdev, char __user *buf,
+ size_t (*rw)(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
- void (*release)(struct vfio_pci_device *vdev,
+ void (*release)(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region);
- int (*mmap)(struct vfio_pci_device *vdev,
+ int (*mmap)(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region,
struct vm_area_struct *vma);
- int (*add_capability)(struct vfio_pci_device *vdev,
+ int (*add_capability)(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region,
struct vfio_info_cap *caps);
};
@@ -94,7 +94,7 @@ struct vfio_pci_mmap_vma {
struct list_head vma_next;
};

-struct vfio_pci_device {
+struct vfio_pci_core_device {
struct vfio_device vdev;
struct pci_dev *pdev;
void __iomem *barmap[PCI_STD_NUM_BARS];
@@ -144,61 +144,61 @@ struct vfio_pci_device {
#define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
#define irq_is(vdev, type) (vdev->irq_type == type)

-extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
-extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
+extern void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
+extern void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev);

-extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev,
+extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev,
uint32_t flags, unsigned index,
unsigned start, unsigned count, void *data);

-extern ssize_t vfio_pci_config_rw(struct vfio_pci_device *vdev,
+extern ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev,
char __user *buf, size_t count,
loff_t *ppos, bool iswrite);

-extern ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
+extern ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);

-extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
+extern ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);

-extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
+extern long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
uint64_t data, int count, int fd);

extern int vfio_pci_init_perm_bits(void);
extern void vfio_pci_uninit_perm_bits(void);

-extern int vfio_config_init(struct vfio_pci_device *vdev);
-extern void vfio_config_free(struct vfio_pci_device *vdev);
+extern int vfio_config_init(struct vfio_pci_core_device *vdev);
+extern void vfio_config_free(struct vfio_pci_core_device *vdev);

-extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
+extern int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,
unsigned int type, unsigned int subtype,
const struct vfio_pci_regops *ops,
size_t size, u32 flags, void *data);

-extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev,
+extern int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev,
pci_power_t state);

-extern bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev);
-extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device
+extern bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev);
+extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device
*vdev);
-extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev);
-extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev,
+extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev);
+extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev,
u16 cmd);

#ifdef CONFIG_VFIO_PCI_IGD
-extern int vfio_pci_igd_init(struct vfio_pci_device *vdev);
+extern int vfio_pci_igd_init(struct vfio_pci_core_device *vdev);
#else
-static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
+static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev)
{
return -ENODEV;
}
#endif

#ifdef CONFIG_S390
-extern int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
+extern int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps);
#else
-static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
+static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
{
return -ENODEV;
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 214b6d629b21..2295eaac4bc9 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -25,7 +25,7 @@
#define OPREGION_RVDS 0x3c2
#define OPREGION_VERSION 0x16

-static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
+static size_t vfio_pci_igd_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
@@ -45,7 +45,7 @@ static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
return count;
}

-static void vfio_pci_igd_release(struct vfio_pci_device *vdev,
+static void vfio_pci_igd_release(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region)
{
memunmap(region->data);
@@ -56,7 +56,7 @@ static const struct vfio_pci_regops vfio_pci_igd_regops = {
.release = vfio_pci_igd_release,
};

-static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
+static int vfio_pci_igd_opregion_init(struct vfio_pci_core_device *vdev)
{
__le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR);
u32 addr, size;
@@ -160,7 +160,7 @@ static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
return ret;
}

-static size_t vfio_pci_igd_cfg_rw(struct vfio_pci_device *vdev,
+static size_t vfio_pci_igd_cfg_rw(struct vfio_pci_core_device *vdev,
char __user *buf, size_t count, loff_t *ppos,
bool iswrite)
{
@@ -253,7 +253,7 @@ static size_t vfio_pci_igd_cfg_rw(struct vfio_pci_device *vdev,
return count;
}

-static void vfio_pci_igd_cfg_release(struct vfio_pci_device *vdev,
+static void vfio_pci_igd_cfg_release(struct vfio_pci_core_device *vdev,
struct vfio_pci_region *region)
{
struct pci_dev *pdev = region->data;
@@ -266,7 +266,7 @@ static const struct vfio_pci_regops vfio_pci_igd_cfg_regops = {
.release = vfio_pci_igd_cfg_release,
};

-static int vfio_pci_igd_cfg_init(struct vfio_pci_device *vdev)
+static int vfio_pci_igd_cfg_init(struct vfio_pci_core_device *vdev)
{
struct pci_dev *host_bridge, *lpc_bridge;
int ret;
@@ -314,7 +314,7 @@ static int vfio_pci_igd_cfg_init(struct vfio_pci_device *vdev)
return 0;
}

-int vfio_pci_igd_init(struct vfio_pci_device *vdev)
+int vfio_pci_igd_init(struct vfio_pci_core_device *vdev)
{
int ret;

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index df1e8c8c274c..945ddbdf4d11 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -27,13 +27,13 @@
*/
static void vfio_send_intx_eventfd(void *opaque, void *unused)
{
- struct vfio_pci_device *vdev = opaque;
+ struct vfio_pci_core_device *vdev = opaque;

if (likely(is_intx(vdev) && !vdev->virq_disabled))
eventfd_signal(vdev->ctx[0].trigger, 1);
}

-void vfio_pci_intx_mask(struct vfio_pci_device *vdev)
+void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
unsigned long flags;
@@ -73,7 +73,7 @@ void vfio_pci_intx_mask(struct vfio_pci_device *vdev)
*/
static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
{
- struct vfio_pci_device *vdev = opaque;
+ struct vfio_pci_core_device *vdev = opaque;
struct pci_dev *pdev = vdev->pdev;
unsigned long flags;
int ret = 0;
@@ -107,7 +107,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
return ret;
}

-void vfio_pci_intx_unmask(struct vfio_pci_device *vdev)
+void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
{
if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
vfio_send_intx_eventfd(vdev, NULL);
@@ -115,7 +115,7 @@ void vfio_pci_intx_unmask(struct vfio_pci_device *vdev)

static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
{
- struct vfio_pci_device *vdev = dev_id;
+ struct vfio_pci_core_device *vdev = dev_id;
unsigned long flags;
int ret = IRQ_NONE;

@@ -139,7 +139,7 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
return ret;
}

-static int vfio_intx_enable(struct vfio_pci_device *vdev)
+static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
{
if (!is_irq_none(vdev))
return -EINVAL;
@@ -168,7 +168,7 @@ static int vfio_intx_enable(struct vfio_pci_device *vdev)
return 0;
}

-static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
+static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
{
struct pci_dev *pdev = vdev->pdev;
unsigned long irqflags = IRQF_SHARED;
@@ -223,7 +223,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
return 0;
}

-static void vfio_intx_disable(struct vfio_pci_device *vdev)
+static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
{
vfio_virqfd_disable(&vdev->ctx[0].unmask);
vfio_virqfd_disable(&vdev->ctx[0].mask);
@@ -244,7 +244,7 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
return IRQ_HANDLED;
}

-static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
+static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
@@ -285,7 +285,7 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
return 0;
}

-static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
+static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
int vector, int fd, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
@@ -364,7 +364,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
return 0;
}

-static int vfio_msi_set_block(struct vfio_pci_device *vdev, unsigned start,
+static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
unsigned count, int32_t *fds, bool msix)
{
int i, j, ret = 0;
@@ -385,7 +385,7 @@ static int vfio_msi_set_block(struct vfio_pci_device *vdev, unsigned start,
return ret;
}

-static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
+static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
int i;
@@ -417,7 +417,7 @@ static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
/*
* IOCTL support
*/
-static int vfio_pci_set_intx_unmask(struct vfio_pci_device *vdev,
+static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
@@ -444,7 +444,7 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_device *vdev,
return 0;
}

-static int vfio_pci_set_intx_mask(struct vfio_pci_device *vdev,
+static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
@@ -464,7 +464,7 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_device *vdev,
return 0;
}

-static int vfio_pci_set_intx_trigger(struct vfio_pci_device *vdev,
+static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
@@ -507,7 +507,7 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_device *vdev,
return 0;
}

-static int vfio_pci_set_msi_trigger(struct vfio_pci_device *vdev,
+static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
@@ -613,7 +613,7 @@ static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx,
return -EINVAL;
}

-static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
+static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
@@ -624,7 +624,7 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
count, flags, data);
}

-static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
+static int vfio_pci_set_req_trigger(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
@@ -635,11 +635,11 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
count, flags, data);
}

-int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
unsigned index, unsigned start, unsigned count,
void *data)
{
- int (*func)(struct vfio_pci_device *vdev, unsigned index,
+ int (*func)(struct vfio_pci_core_device *vdev, unsigned index,
unsigned start, unsigned count, uint32_t flags,
void *data) = NULL;

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 667e82726e75..8fff4689dd44 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -38,7 +38,7 @@
#define vfio_iowrite8 iowrite8

#define VFIO_IOWRITE(size) \
-static int vfio_pci_iowrite##size(struct vfio_pci_device *vdev, \
+static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev, \
bool test_mem, u##size val, void __iomem *io) \
{ \
if (test_mem) { \
@@ -65,7 +65,7 @@ VFIO_IOWRITE(64)
#endif

#define VFIO_IOREAD(size) \
-static int vfio_pci_ioread##size(struct vfio_pci_device *vdev, \
+static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev, \
bool test_mem, u##size *val, void __iomem *io) \
{ \
if (test_mem) { \
@@ -94,7 +94,7 @@ VFIO_IOREAD(32)
* reads with -1. This is intended for handling MSI-X vector tables and
* leftover space for ROM BARs.
*/
-static ssize_t do_io_rw(struct vfio_pci_device *vdev, bool test_mem,
+static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
void __iomem *io, char __user *buf,
loff_t off, size_t count, size_t x_start,
size_t x_end, bool iswrite)
@@ -200,7 +200,7 @@ static ssize_t do_io_rw(struct vfio_pci_device *vdev, bool test_mem,
return done;
}

-static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
+static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
{
struct pci_dev *pdev = vdev->pdev;
int ret;
@@ -224,7 +224,7 @@ static int vfio_pci_setup_barmap(struct vfio_pci_device *vdev, int bar)
return 0;
}

-ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
+ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
struct pci_dev *pdev = vdev->pdev;
@@ -288,7 +288,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
return done;
}

-ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
+ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
int ret;
@@ -384,7 +384,7 @@ static void vfio_pci_ioeventfd_do_write(struct vfio_pci_ioeventfd *ioeventfd,
static int vfio_pci_ioeventfd_handler(void *opaque, void *unused)
{
struct vfio_pci_ioeventfd *ioeventfd = opaque;
- struct vfio_pci_device *vdev = ioeventfd->vdev;
+ struct vfio_pci_core_device *vdev = ioeventfd->vdev;

if (ioeventfd->test_mem) {
if (!down_read_trylock(&vdev->memory_lock))
@@ -410,7 +410,7 @@ static void vfio_pci_ioeventfd_thread(void *opaque, void *unused)
vfio_pci_ioeventfd_do_write(ioeventfd, ioeventfd->test_mem);
}

-long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
+long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
uint64_t data, int count, int fd)
{
struct pci_dev *pdev = vdev->pdev;
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index ecae0c3d95a0..2ffbdc11f089 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -114,7 +114,7 @@ static int zpci_pfip_cap(struct zpci_dev *zdev, struct vfio_info_cap *caps)
/*
* Add all supported capabilities to the VFIO_DEVICE_GET_INFO capability chain.
*/
-int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
+int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
struct vfio_info_cap *caps)
{
struct zpci_dev *zdev = to_zpci(vdev->pdev);
--
2.21.0

2021-06-03 16:11:10

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 05/11] vfio-pci: include vfio header in vfio_pci_core.h

The vfio_device structure is embedded into the vfio_pci_core_device
structure, so there is no reason for not including the header file in
the vfio_pci_core header as well.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 1 -
drivers/vfio/pci/vfio_pci_core.h | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 2d2fa64cc8a0..61cd785c80e8 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -23,7 +23,6 @@
#include <linux/slab.h>
#include <linux/types.h>
#include <linux/uaccess.h>
-#include <linux/vfio.h>
#include <linux/vgaarb.h>
#include <linux/nospec.h>
#include <linux/sched/mm.h>
diff --git a/drivers/vfio/pci/vfio_pci_core.h b/drivers/vfio/pci/vfio_pci_core.h
index 9aa558672b83..cb24d229df66 100644
--- a/drivers/vfio/pci/vfio_pci_core.h
+++ b/drivers/vfio/pci/vfio_pci_core.h
@@ -10,6 +10,7 @@

#include <linux/mutex.h>
#include <linux/pci.h>
+#include <linux/vfio.h>
#include <linux/irqbypass.h>
#include <linux/types.h>
#include <linux/uuid.h>
--
2.21.0

2021-06-03 16:11:33

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 07/11] vfio-pci: move igd initialization to vfio_pci.c

This is a preparation before splitting vfio_pci.ko to 2 drivers. Move
the vendor specific igd initialization from the core part to pci_driver
part.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 31 ++++++++++++++++++++-
drivers/vfio/pci/vfio_pci_core.c | 46 +++++++++++---------------------
drivers/vfio/pci/vfio_pci_core.h | 8 ++++++
3 files changed, 54 insertions(+), 31 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 23a21ecbc674..850ea3a94e28 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -83,9 +83,38 @@ static bool vfio_pci_is_denylisted(struct pci_dev *pdev)
return true;
}

+static int vfio_pci_open(struct vfio_device *core_vdev)
+{
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
+ struct pci_dev *pdev = vdev->pdev;
+ int ret;
+
+ lockdep_assert_held(&core_vdev->reflck->lock);
+
+ ret = vfio_pci_core_enable(vdev);
+ if (ret)
+ return ret;
+
+ if (vfio_pci_is_vga(pdev) &&
+ pdev->vendor == PCI_VENDOR_ID_INTEL &&
+ IS_ENABLED(CONFIG_VFIO_PCI_IGD)) {
+ ret = vfio_pci_igd_init(vdev);
+ if (ret && ret != -ENODEV) {
+ pci_warn(pdev, "Failed to setup Intel IGD regions\n");
+ vfio_pci_core_disable(vdev);
+ return ret;
+ }
+ }
+
+ vfio_pci_core_finish_enable(vdev);
+
+ return 0;
+}
+
static const struct vfio_device_ops vfio_pci_ops = {
.name = "vfio-pci",
- .open = vfio_pci_core_open,
+ .open = vfio_pci_open,
.release = vfio_pci_core_release,
.ioctl = vfio_pci_core_ioctl,
.read = vfio_pci_core_read,
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 12d5392c78cc..39a3f18bbc08 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -91,11 +91,6 @@ static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
return decodes;
}

-static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
-{
- return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
-}
-
static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
{
struct resource *res;
@@ -165,7 +160,6 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
}

static void vfio_pci_try_bus_reset(struct vfio_pci_core_device *vdev);
-static void vfio_pci_disable(struct vfio_pci_core_device *vdev);
static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);

/*
@@ -250,7 +244,7 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
return ret;
}

-static int vfio_pci_enable(struct vfio_pci_core_device *vdev)
+int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
int ret;
@@ -319,26 +313,11 @@ static int vfio_pci_enable(struct vfio_pci_core_device *vdev)
if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
vdev->has_vga = true;

- if (vfio_pci_is_vga(pdev) &&
- pdev->vendor == PCI_VENDOR_ID_INTEL &&
- IS_ENABLED(CONFIG_VFIO_PCI_IGD)) {
- ret = vfio_pci_igd_init(vdev);
- if (ret && ret != -ENODEV) {
- pci_warn(pdev, "Failed to setup Intel IGD regions\n");
- goto disable_exit;
- }
- }
-
- vfio_pci_probe_mmaps(vdev);

return 0;
-
-disable_exit:
- vfio_pci_disable(vdev);
- return ret;
}

-static void vfio_pci_disable(struct vfio_pci_core_device *vdev)
+void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_dummy_resource *dummy_res, *tmp;
@@ -481,7 +460,7 @@ void vfio_pci_core_release(struct vfio_device *core_vdev)

vfio_pci_vf_token_user_add(vdev, -1);
vfio_spapr_pci_eeh_release(vdev->pdev);
- vfio_pci_disable(vdev);
+ vfio_pci_core_disable(vdev);

mutex_lock(&vdev->igate);
if (vdev->err_trigger) {
@@ -495,6 +474,13 @@ void vfio_pci_core_release(struct vfio_device *core_vdev)
mutex_unlock(&vdev->igate);
}

+void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
+{
+ vfio_pci_probe_mmaps(vdev);
+ vfio_spapr_pci_eeh_open(vdev->pdev);
+ vfio_pci_vf_token_user_add(vdev, 1);
+}
+
int vfio_pci_core_open(struct vfio_device *core_vdev)
{
struct vfio_pci_core_device *vdev =
@@ -503,13 +489,13 @@ int vfio_pci_core_open(struct vfio_device *core_vdev)

lockdep_assert_held(&core_vdev->reflck->lock);

- ret = vfio_pci_enable(vdev);
+ ret = vfio_pci_core_enable(vdev);
if (ret)
- goto error;
- vfio_spapr_pci_eeh_open(vdev->pdev);
- vfio_pci_vf_token_user_add(vdev, 1);
-error:
- return ret;
+ return ret;
+
+ vfio_pci_core_finish_enable(vdev);
+
+ return 0;
}

static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type)
diff --git a/drivers/vfio/pci/vfio_pci_core.h b/drivers/vfio/pci/vfio_pci_core.h
index 245862d5d6e4..406e934e23b2 100644
--- a/drivers/vfio/pci/vfio_pci_core.h
+++ b/drivers/vfio/pci/vfio_pci_core.h
@@ -227,5 +227,13 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count);
int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
int vfio_pci_core_reflck_attach(struct vfio_device *core_vdev);
+int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
+void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
+void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+
+static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
+{
+ return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
+}

#endif /* VFIO_PCI_CORE_H */
--
2.21.0

2021-06-03 16:11:40

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 08/11] PCI: add flags field to pci_device_id structure

This field will be used to allow pci modules to set some specific data
that can be used for matching, aliases and other hints. Add example for
"driver_override" pci drivers that will set a special prefix in the
modules.alias table. In the future, this flag will enforce
"driver_override" to work on drivers that specifically opt into this
feature. The udev utility will not try to load these drivers
automatically in order to bind to new discovered devices. This will be
because the modalias tables populated by those drivers will be different
from "regular" pci modalias tables. Userspace utilities, such as
libvirt, will need to adjust and bind devices according to new matching
mechanism with taking "driver_override" enforcement into consideration.

Signed-off-by: Max Gurtovoy <[email protected]>
---
Documentation/PCI/pci.rst | 1 +
include/linux/mod_devicetable.h | 9 +++++++++
include/linux/pci.h | 27 +++++++++++++++++++++++++++
scripts/mod/devicetable-offsets.c | 1 +
scripts/mod/file2alias.c | 8 ++++++--
5 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
index 814b40f8360b..0855657daf93 100644
--- a/Documentation/PCI/pci.rst
+++ b/Documentation/PCI/pci.rst
@@ -103,6 +103,7 @@ need pass only as many optional fields as necessary:
- subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
- class and classmask fields default to 0
- driver_data defaults to 0UL.
+ - flags field defaults to 0.

Note that driver_data must match the value used by any of the pci_device_id
entries defined in the driver. This makes the driver_data field mandatory
diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 7d45b5f989b0..0f5c0355992c 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -16,6 +16,13 @@ typedef unsigned long kernel_ulong_t;

#define PCI_ANY_ID (~0)

+
+enum pci_id_flags {
+ PCI_ID_F_VFIO_DRIVER_OVERRIDE = 1 << 0,
+};
+
+#define PCI_ID_F_DRIVER_OVERRIDE PCI_ID_F_VFIO_DRIVER_OVERRIDE
+
/**
* struct pci_device_id - PCI device ID structure
* @vendor: Vendor ID to match (or PCI_ANY_ID)
@@ -34,12 +41,14 @@ typedef unsigned long kernel_ulong_t;
* Best practice is to use driver_data as an index
* into a static list of equivalent device types,
* instead of using it as a pointer.
+ * @flags: PCI flags of the driver. Bitmap of pci_id_flags enum.
*/
struct pci_device_id {
__u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/
__u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */
__u32 class, class_mask; /* (class,subclass,prog-if) triplet */
kernel_ulong_t driver_data; /* Data private to the driver */
+ __u32 flags;
};


diff --git a/include/linux/pci.h b/include/linux/pci.h
index c20211e59a57..602603526327 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -898,6 +898,33 @@ struct pci_driver {
.vendor = (vend), .device = (dev), \
.subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID

+/**
+ * PCI_DEVICE_FLAGS - macro used to describe a PCI device with specific flags.
+ * @vend: the 16 bit PCI Vendor ID
+ * @dev: the 16 bit PCI Device ID
+ * @flags: PCI Device flags as a bitmap of pci_id_flags enum
+ *
+ * This macro is used to create a struct pci_device_id that matches a
+ * specific device. The subvendor and subdevice fields will be set to
+ * PCI_ANY_ID.
+ */
+#define PCI_DEVICE_FLAGS(vend, dev, fl) \
+ .vendor = (vend), .device = (dev), .subvendor = PCI_ANY_ID, \
+ .subdevice = PCI_ANY_ID, .flags = (fl)
+
+/**
+ * PCI_DRIVER_OVERRIDE_DEVICE_VFIO - macro used to describe a VFIO
+ * "driver_override" PCI device.
+ * @vend: the 16 bit PCI Vendor ID
+ * @dev: the 16 bit PCI Device ID
+ *
+ * This macro is used to create a struct pci_device_id that matches a
+ * specific device. The subvendor and subdevice fields will be set to
+ * PCI_ANY_ID and the flags will be set to PCI_ID_F_VFIO_DRIVER_OVERRIDE.
+ */
+#define PCI_DRIVER_OVERRIDE_DEVICE_VFIO(vend, dev) \
+ PCI_DEVICE_FLAGS(vend, dev, PCI_ID_F_VFIO_DRIVER_OVERRIDE)
+
/**
* PCI_DEVICE_SUB - macro used to describe a specific PCI device with subsystem
* @vend: the 16 bit PCI Vendor ID
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index 9bb6c7edccc4..b927c36b8333 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -42,6 +42,7 @@ int main(void)
DEVID_FIELD(pci_device_id, subdevice);
DEVID_FIELD(pci_device_id, class);
DEVID_FIELD(pci_device_id, class_mask);
+ DEVID_FIELD(pci_device_id, flags);

DEVID(ccw_device_id);
DEVID_FIELD(ccw_device_id, match_flags);
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 7c97fa8e36bc..b0add27de795 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -426,7 +426,7 @@ static int do_ieee1394_entry(const char *filename,
return 1;
}

-/* Looks like: pci:vNdNsvNsdNbcNscNiN. */
+/* Looks like: pci:vNdNsvNsdNbcNscNiN or driver_override_pci:vNdNsvNsdNbcNscNiN. */
static int do_pci_entry(const char *filename,
void *symval, char *alias)
{
@@ -440,8 +440,12 @@ static int do_pci_entry(const char *filename,
DEF_FIELD(symval, pci_device_id, subdevice);
DEF_FIELD(symval, pci_device_id, class);
DEF_FIELD(symval, pci_device_id, class_mask);
+ DEF_FIELD(symval, pci_device_id, flags);

- strcpy(alias, "pci:");
+ if (flags & PCI_ID_F_VFIO_DRIVER_OVERRIDE)
+ strcpy(alias, "vfio_pci:");
+ else
+ strcpy(alias, "pci:");
ADD(alias, "v", vendor != PCI_ANY_ID, vendor);
ADD(alias, "d", device != PCI_ANY_ID, device);
ADD(alias, "sv", subvendor != PCI_ANY_ID, subvendor);
--
2.21.0

2021-06-03 16:12:00

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 10/11] vfio-pci: introduce vfio_pci_core subsystem driver

Now that vfio_pci has been split into two source modules, one focusing
on the "struct pci_driver" (vfio_pci.c) and a toolbox library of code
(vfio_pci_core.c), complete the split and move them into two different
modules.

As before vfio_pci.ko continues to present the same interface under
sysfs and this change will have no functional impact.

Below is an example for adding new driver that will use vfio pci
subsystem:

+---------------------------------------------------+
| |
| VFIO |
| |
+---------------------------------------------------+

+---------------------------------------------------+
| |
| VFIO_PCI_CORE |
| |
+---------------------------------------------------+

+----------+ +---------------+ +--------------------+
| | | | | |
| VFIO_PCI | | MLX5_VFIO_PCI | | HISILICON_VFIO_PCI |
| | | | | |
+----------+ +---------------+ +--------------------+

Splitting into another module and adding exports allows creating new HW
specific vfio_pci drivers that can implement device specific
functionality, such as VFIO migration interfaces or specialized device
requirements.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/Kconfig | 18 ++++++---
drivers/vfio/pci/Makefile | 11 +++--
drivers/vfio/pci/vfio_pci.c | 14 ++-----
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/pci/vfio_pci_core.c | 40 ++++++++++++++++---
drivers/vfio/pci/vfio_pci_igd.c | 2 +-
drivers/vfio/pci/vfio_pci_intrs.c | 2 +-
drivers/vfio/pci/vfio_pci_rdwr.c | 2 +-
drivers/vfio/pci/vfio_pci_zdev.c | 2 +-
.../pci => include/linux}/vfio_pci_core.h | 10 ++---
10 files changed, 66 insertions(+), 37 deletions(-)
rename {drivers/vfio/pci => include/linux}/vfio_pci_core.h (96%)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 5e2e1b9a9fd3..384d06661f30 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
-config VFIO_PCI
- tristate "VFIO support for PCI devices"
+config VFIO_PCI_CORE
+ tristate "VFIO core support for PCI devices"
depends on VFIO && PCI && EVENTFD
depends on MMU
select VFIO_VIRQFD
@@ -11,9 +11,17 @@ config VFIO_PCI

If you don't know what to do here, say N.

+config VFIO_PCI
+ tristate "VFIO support for PCI devices"
+ depends on VFIO_PCI_CORE
+ help
+ This provides a generic PCI support using the VFIO framework.
+
+ If you don't know what to do here, say N.
+
config VFIO_PCI_VGA
bool "VFIO PCI support for VGA devices"
- depends on VFIO_PCI && X86 && VGA_ARB
+ depends on VFIO_PCI_CORE && X86 && VGA_ARB
help
Support for VGA extension to VFIO PCI. This exposes an additional
region on VGA devices for accessing legacy VGA addresses used by
@@ -22,11 +30,11 @@ config VFIO_PCI_VGA
If you don't know what to do here, say N.

config VFIO_PCI_MMAP
- depends on VFIO_PCI
+ depends on VFIO_PCI_CORE
def_bool y if !S390

config VFIO_PCI_INTX
- depends on VFIO_PCI
+ depends on VFIO_PCI_CORE
def_bool y if !S390

config VFIO_PCI_IGD
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 8aa517b4b671..ddba4759cde7 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,7 +1,10 @@
# SPDX-License-Identifier: GPL-2.0-only

-vfio-pci-y := vfio_pci.o vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
-vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
-vfio-pci-$(CONFIG_S390) += vfio_pci_zdev.o
-
+obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+
+vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o
+
+vfio-pci-y := vfio_pci.o
+vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 9beb4b841945..56870a6540d7 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -25,7 +25,7 @@
#include <linux/types.h>
#include <linux/uaccess.h>

-#include "vfio_pci_core.h"
+#include <linux/vfio_pci_core.h>

#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
@@ -141,6 +141,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (ret)
goto out_free;

+ dev_set_drvdata(&pdev->dev, vdev);
return 0;

out_free:
@@ -185,7 +186,6 @@ static struct pci_driver vfio_pci_driver = {
static void __exit vfio_pci_cleanup(void)
{
pci_unregister_driver(&vfio_pci_driver);
- vfio_pci_core_cleanup();
}

static void __init vfio_pci_fill_ids(void)
@@ -233,14 +233,10 @@ static int __init vfio_pci_init(void)
{
int ret;

- ret = vfio_pci_core_init();
- if (ret)
- return ret;
-
/* Register and scan for devices */
ret = pci_register_driver(&vfio_pci_driver);
if (ret)
- goto out;
+ return ret;

vfio_pci_fill_ids();

@@ -248,10 +244,6 @@ static int __init vfio_pci_init(void)
pr_warn("device denylist disabled.\n");

return 0;
-
-out:
- vfio_pci_core_cleanup();
- return ret;
}

module_init(vfio_pci_init);
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 1f034f768a27..6e58b4bf7a60 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -26,7 +26,7 @@
#include <linux/vfio.h>
#include <linux/slab.h>

-#include "vfio_pci_core.h"
+#include <linux/vfio_pci_core.h>

/* Fake capability ID for standard config space */
#define PCI_CAP_ID_BASIC 0
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 39a3f18bbc08..a1ce79160f6f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -8,6 +8,8 @@
* Author: Tom Lyon, [email protected]
*/

+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/device.h>
#include <linux/eventfd.h>
#include <linux/file.h>
@@ -25,7 +27,11 @@
#include <linux/nospec.h>
#include <linux/sched/mm.h>

-#include "vfio_pci_core.h"
+#include <linux/vfio_pci_core.h>
+
+#define DRIVER_VERSION "0.2"
+#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
+#define DRIVER_DESC "core driver for VFIO based PCI devices"

static bool nointxmask;
module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
@@ -316,6 +322,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)

return 0;
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_enable);

void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
{
@@ -415,6 +422,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
if (!disable_idle_d3)
vfio_pci_set_power_state(vdev, PCI_D3hot);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_disable);

static struct vfio_pci_core_device *get_pf_vdev(struct vfio_pci_core_device *vdev)
{
@@ -473,6 +481,7 @@ void vfio_pci_core_release(struct vfio_device *core_vdev)
}
mutex_unlock(&vdev->igate);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_release);

void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
{
@@ -480,6 +489,7 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
vfio_spapr_pci_eeh_open(vdev->pdev);
vfio_pci_vf_token_user_add(vdev, 1);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_finish_enable);

int vfio_pci_core_open(struct vfio_device *core_vdev)
{
@@ -497,6 +507,7 @@ int vfio_pci_core_open(struct vfio_device *core_vdev)

return 0;
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_open);

static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type)
{
@@ -681,6 +692,7 @@ int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,

return 0;
}
+EXPORT_SYMBOL_GPL(vfio_pci_register_dev_region);

struct vfio_devices {
struct vfio_pci_core_device **devices;
@@ -1287,6 +1299,7 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,

return -ENOTTY;
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);

static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
@@ -1330,6 +1343,7 @@ ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,

return vfio_pci_rw(vdev, buf, count, ppos, false);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_read);

ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
size_t count, loff_t *ppos)
@@ -1342,6 +1356,7 @@ ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *bu

return vfio_pci_rw(vdev, (char __user *)buf, count, ppos, true);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_write);

/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */
static int vfio_pci_zap_and_vma_lock(struct vfio_pci_core_device *vdev, bool try)
@@ -1607,6 +1622,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma

return 0;
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_mmap);

void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count)
{
@@ -1629,6 +1645,7 @@ void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count)

mutex_unlock(&vdev->igate);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_request);

static int vfio_pci_validate_vf_token(struct vfio_pci_core_device *vdev,
bool vf_token, uuid_t *uuid)
@@ -1773,6 +1790,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf)

return 1; /* Match */
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_match);

static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data)
{
@@ -1814,6 +1832,7 @@ int vfio_pci_core_reflck_attach(struct vfio_device *core_vdev)

return PTR_ERR_OR_ZERO(core_vdev->reflck);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_reflck_attach);

static int vfio_pci_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data)
@@ -1972,7 +1991,6 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
ret = vfio_register_group_dev(&vdev->vdev);
if (ret)
goto out_power;
- dev_set_drvdata(&pdev->dev, vdev);
return 0;

out_power:
@@ -1987,6 +2005,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
vfio_iommu_group_put(group, &pdev->dev);
return ret;
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_register_device);

void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
{
@@ -2009,6 +2028,7 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
kfree(vdev->region);
kfree(vdev->pm_save);
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_unregister_device);

static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state)
@@ -2054,10 +2074,12 @@ int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn)

return ret < 0 ? ret : nr_virtfn;
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_sriov_configure);

const struct pci_error_handlers vfio_pci_core_err_handlers = {
.error_detected = vfio_pci_aer_err_detected,
};
+EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);

static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
{
@@ -2197,15 +2219,21 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_core_device *vdev)
kfree(devs.devices);
}

-/* This will become the __exit function of vfio_pci_core.ko */
-void vfio_pci_core_cleanup(void)
+static void vfio_pci_core_cleanup(void)
{
vfio_pci_uninit_perm_bits();
}

-/* This will become the __init function of vfio_pci_core.ko */
-int __init vfio_pci_core_init(void)
+static int __init vfio_pci_core_init(void)
{
/* Allocate shared config space permission data used by all devices */
return vfio_pci_init_perm_bits();
}
+
+module_init(vfio_pci_core_init);
+module_exit(vfio_pci_core_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 2295eaac4bc9..f04774cd3a7f 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -15,7 +15,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>

-#include "vfio_pci_core.h"
+#include <linux/vfio_pci_core.h>

#define OPREGION_SIGNATURE "IntelGraphicsMem"
#define OPREGION_SIZE (8 * 1024)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 945ddbdf4d11..6069a11fb51a 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -20,7 +20,7 @@
#include <linux/wait.h>
#include <linux/slab.h>

-#include "vfio_pci_core.h"
+#include <linux/vfio_pci_core.h>

/*
* INTx
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 8fff4689dd44..57d3b2cbbd8e 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -17,7 +17,7 @@
#include <linux/vfio.h>
#include <linux/vgaarb.h>

-#include "vfio_pci_core.h"
+#include <linux/vfio_pci_core.h>

#ifdef __LITTLE_ENDIAN
#define vfio_ioread64 ioread64
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 2ffbdc11f089..fe4def9ffffb 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -19,7 +19,7 @@
#include <asm/pci_clp.h>
#include <asm/pci_io.h>

-#include "vfio_pci_core.h"
+#include <linux/vfio_pci_core.h>

/*
* Add the Base PCI Function information to the device info region.
diff --git a/drivers/vfio/pci/vfio_pci_core.h b/include/linux/vfio_pci_core.h
similarity index 96%
rename from drivers/vfio/pci/vfio_pci_core.h
rename to include/linux/vfio_pci_core.h
index 406e934e23b2..5a48c3c552b1 100644
--- a/drivers/vfio/pci/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -171,10 +171,10 @@ extern void vfio_pci_uninit_perm_bits(void);
extern int vfio_config_init(struct vfio_pci_core_device *vdev);
extern void vfio_config_free(struct vfio_pci_core_device *vdev);

-extern int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,
- unsigned int type, unsigned int subtype,
- const struct vfio_pci_regops *ops,
- size_t size, u32 flags, void *data);
+int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev,
+ unsigned int type, unsigned int subtype,
+ const struct vfio_pci_regops *ops,
+ size_t size, u32 flags, void *data);

extern int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev,
pci_power_t state);
@@ -207,8 +207,6 @@ static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
#endif

/* Will be exported for vfio pci drivers usage */
-void vfio_pci_core_cleanup(void);
-int vfio_pci_core_init(void);
void vfio_pci_core_release(struct vfio_device *core_vdev);
int vfio_pci_core_open(struct vfio_device *core_vdev);
int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev,
--
2.21.0

2021-06-03 16:12:17

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 11/11] mlx5-vfio-pci: add new vfio_pci driver for mlx5 devices

This is the initial step for adding vendor specific vfio_pci driver for
mlx5 devices. mlx5_vfio_pci use vfio_pci_core to register to the VFIO
subsystem and also to implement the basic functionality of a pci device.

It will also extend this basic functionality and add mlx5 specific logic
for various features (such as live migration, for example).

Note: Although we've created the mlx5-vfio-pci.ko, the binding to
vfio-pci.ko will still work as before. It's fully backward compatible.
Of course, the extended vendor functionality will not exist in case one
will bind the device to the generic vfio_pci.ko.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/vfio/pci/Kconfig | 9 +++
drivers/vfio/pci/Makefile | 3 +
drivers/vfio/pci/mlx5_vfio_pci.c | 130 +++++++++++++++++++++++++++++++
3 files changed, 142 insertions(+)
create mode 100644 drivers/vfio/pci/mlx5_vfio_pci.c

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 384d06661f30..9cdef46dd299 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -48,3 +48,12 @@ config VFIO_PCI_IGD
and LPC bridge config space.

To enable Intel IGD assignment through vfio-pci, say Y.
+
+config MLX5_VFIO_PCI
+ tristate "VFIO support for MLX5 PCI devices"
+ depends on VFIO_PCI_CORE && MLX5_CORE
+ help
+ This provides a generic PCI support for MLX5 devices using the VFIO
+ framework.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index ddba4759cde7..a0df9c2a4bd9 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,9 +2,12 @@

obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5-vfio-pci.o

vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o

vfio-pci-y := vfio_pci.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
+
+mlx5-vfio-pci-y := mlx5_vfio_pci.o
diff --git a/drivers/vfio/pci/mlx5_vfio_pci.c b/drivers/vfio/pci/mlx5_vfio_pci.c
new file mode 100644
index 000000000000..0153682a3d3f
--- /dev/null
+++ b/drivers/vfio/pci/mlx5_vfio_pci.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021, Mellanox Technologies. All rights reserved.
+ * Author: Max Gurtovoy <[email protected]>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/sched/mm.h>
+#include <linux/mlx5/driver.h>
+
+#include <linux/vfio_pci_core.h>
+
+static int mlx5_vfio_pci_open(struct vfio_device *core_vdev)
+{
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
+ int ret;
+
+ lockdep_assert_held(&core_vdev->reflck->lock);
+
+ ret = vfio_pci_core_enable(vdev);
+ if (ret)
+ return ret;
+
+ /* TODO: register migration region here for capable devices */
+
+ vfio_pci_core_finish_enable(vdev);
+
+ return 0;
+}
+
+static const struct vfio_device_ops mlx5_vfio_pci_ops = {
+ .name = "mlx5-vfio-pci",
+ .open = mlx5_vfio_pci_open,
+ .release = vfio_pci_core_release,
+ .ioctl = vfio_pci_core_ioctl,
+ .read = vfio_pci_core_read,
+ .write = vfio_pci_core_write,
+ .mmap = vfio_pci_core_mmap,
+ .request = vfio_pci_core_request,
+ .match = vfio_pci_core_match,
+ .reflck_attach = vfio_pci_core_reflck_attach,
+};
+
+static int mlx5_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct vfio_pci_core_device *vdev;
+ int ret;
+
+ vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+ if (!vdev)
+ return -ENOMEM;
+
+ ret = vfio_pci_core_register_device(vdev, pdev, &mlx5_vfio_pci_ops);
+ if (ret)
+ goto out_free;
+
+ dev_set_drvdata(&pdev->dev, vdev);
+ return 0;
+
+out_free:
+ kfree(vdev);
+ return ret;
+}
+
+static void mlx5_vfio_pci_remove(struct pci_dev *pdev)
+{
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
+
+ vfio_pci_core_unregister_device(vdev);
+ kfree(vdev);
+}
+
+static const struct pci_device_id mlx5_vfio_pci_table[] = {
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x101b) }, /* ConnectX-6 */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x101c) }, /* ConnectX-6 VF */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x101d) }, /* ConnectX-6 Dx */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x101e) }, /* ConnectX Family mlx5Gen Virtual Function */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x101f) }, /* ConnectX-6 LX */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x1021) }, /* ConnectX-7 */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0xa2d2) }, /* BlueField integrated ConnectX-5 network controller */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0xa2d3) }, /* BlueField integrated ConnectX-5 network controller VF */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0xa2d6) }, /* BlueField-2 integrated ConnectX-6 Dx network controller */
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0xa2dc) }, /* BlueField-3 integrated ConnectX-7 network controller */
+ { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, mlx5_vfio_pci_table);
+
+static struct pci_driver mlx5_vfio_pci_driver = {
+ .name = "mlx5-vfio-pci",
+ .id_table = mlx5_vfio_pci_table,
+ .probe = mlx5_vfio_pci_probe,
+ .remove = mlx5_vfio_pci_remove,
+#ifdef CONFIG_PCI_IOV
+ .sriov_configure = vfio_pci_core_sriov_configure,
+#endif
+ .err_handler = &vfio_pci_core_err_handlers,
+};
+
+static void __exit mlx5_vfio_pci_cleanup(void)
+{
+ pci_unregister_driver(&mlx5_vfio_pci_driver);
+}
+
+static int __init mlx5_vfio_pci_init(void)
+{
+ return pci_register_driver(&mlx5_vfio_pci_driver);
+}
+
+module_init(mlx5_vfio_pci_init);
+module_exit(mlx5_vfio_pci_cleanup);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Max Gurtovoy <[email protected]>");
+MODULE_DESCRIPTION("MLX5 VFIO PCI - User Level meta-driver for MLX5 device family");
--
2.21.0

2021-06-03 16:13:16

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 09/11] PCI: add matching checks for driver_override binding

Allowing any driver in the system to be unconditionally bound to any
PCI HW is dangerous. Connecting a driver to a physical HW device it was
never intended to operate may trigger exploitable kernel bugs, or worse.
It also allows userspace to load and run kernel code that otherwise
would never be runnable on the system.

driver_override was designed to make it easier to load vfio_pci, so
focus it on that single use case. driver_override will only work on
drivers that specifically opt into this feature and the driver now has
the opportunity to provide a proper match table that indicates what HW
it can properly support. vfio-pci continues to support everything.

This also causes the modalias tables to be populated with dedicated
match table and userspace now becomes aware that vfio-pci can be loaded
against any PCI device using driver_override.

Signed-off-by: Max Gurtovoy <[email protected]>
---
Documentation/ABI/testing/sysfs-bus-pci | 6 +++++-
drivers/pci/pci-driver.c | 22 ++++++++++++----------
drivers/vfio/pci/vfio_pci.c | 9 ++++++++-
3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index ef00fada2efb..6d78970b1c69 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -289,7 +289,11 @@ Description:
will not bind to any driver. This also allows devices to
opt-out of driver binding using a driver_override name such as
"none". Only a single driver may be specified in the override,
- there is no support for parsing delimiters.
+ there is no support for parsing delimiters. The binding to the
+ device is allowed if and only if the matching driver has
+ enabled the driver_override alias in the ID table (by setting
+ a suitable bit in the "flags" field of pci_device_id
+ structure).

What: /sys/bus/pci/devices/.../numa_node
Date: Oct 2014
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index ec44a79e951a..e4037db817da 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -115,13 +115,6 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
}
EXPORT_SYMBOL(pci_match_id);

-static const struct pci_device_id pci_device_id_any = {
- .vendor = PCI_ANY_ID,
- .device = PCI_ANY_ID,
- .subvendor = PCI_ANY_ID,
- .subdevice = PCI_ANY_ID,
-};
-
/**
* pci_match_device - See if a device matches a driver's list of IDs
* @drv: the PCI driver to match against
@@ -137,6 +130,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
{
struct pci_dynid *dynid;
const struct pci_device_id *found_id = NULL;
+ bool is_driver_override;

/* When driver_override is set, only bind to the matching driver */
if (dev->driver_override && strcmp(dev->driver_override, drv->name))
@@ -155,9 +149,17 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
if (!found_id)
found_id = pci_match_id(drv->id_table, dev);

- /* driver_override will always match, send a dummy id */
- if (!found_id && dev->driver_override)
- found_id = &pci_device_id_any;
+ /*
+ * if and only if PCI_ID_F_DRIVER_OVERRIDE flag is set, driver_override
+ * should be provided.
+ */
+ if (found_id) {
+ is_driver_override =
+ (found_id->flags & PCI_ID_F_DRIVER_OVERRIDE) != 0;
+ if ((is_driver_override && !dev->driver_override) ||
+ (dev->driver_override && !is_driver_override))
+ return NULL;
+ }

return found_id;
}
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 850ea3a94e28..9beb4b841945 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -166,9 +166,16 @@ static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
return vfio_pci_core_sriov_configure(pdev, nr_virtfn);
}

+static const struct pci_device_id vfio_pci_table[] = {
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_ANY_ID, PCI_ANY_ID) }, /* match all by default */
+ { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, vfio_pci_table);
+
static struct pci_driver vfio_pci_driver = {
.name = "vfio-pci",
- .id_table = NULL, /* only dynamic ids */
+ .id_table = vfio_pci_table,
.probe = vfio_pci_probe,
.remove = vfio_pci_remove,
.sriov_configure = vfio_pci_sriov_configure,
--
2.21.0

2021-06-08 21:28:37

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Thu, 3 Jun 2021 19:08:07 +0300
Max Gurtovoy <[email protected]> wrote:

> Allowing any driver in the system to be unconditionally bound to any
> PCI HW is dangerous. Connecting a driver to a physical HW device it was
> never intended to operate may trigger exploitable kernel bugs, or worse.
> It also allows userspace to load and run kernel code that otherwise
> would never be runnable on the system.

This is just another way that an admin can do bad things, with the
intention that they know what they're doing and if not they get to
keep the pieces. There's also still the new_id scheme for binding the
wrong drivers to devices, so the hole this claims to be addressing is
still fully present.

> driver_override was designed to make it easier to load vfio_pci, so

Actually driver_override was designed to resolve the non-deterministic
behavior of new_id, which allows inserting dynamic match entries. The
problem is those match entries match any device that might come along
during the time window when userspace is trying to bind a specific
device to a specific driver. driver_override flipped the problem to
match a device to a driver rather than vice versa. Other bus types
have since adopted driver_override interfaces as well.

> focus it on that single use case. driver_override will only work on

It's used for other use cases across numerous bus types now. For
instance, how can I user driver_override to bind pci-stub to a device
after this? driverctl(8) uses driver_override to perform arbitrary
driver overrides, this breaks all but the vfio-pci use case.

> drivers that specifically opt into this feature and the driver now has
> the opportunity to provide a proper match table that indicates what HW
> it can properly support. vfio-pci continues to support everything.

In doing so, this also breaks the new_id method for vfio-pci. Sorry,
with so many userspace regressions, crippling the driver_override
interface with an assumption of such a narrow focus, creating a vfio
specific match flag, I don't see where this can go. Thanks,

Alex

2021-06-08 21:28:56

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 10/11] vfio-pci: introduce vfio_pci_core subsystem driver

On Thu, 3 Jun 2021 19:08:08 +0300
Max Gurtovoy <[email protected]> wrote:
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 5e2e1b9a9fd3..384d06661f30 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -1,6 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -config VFIO_PCI
> - tristate "VFIO support for PCI devices"
> +config VFIO_PCI_CORE
> + tristate "VFIO core support for PCI devices"
> depends on VFIO && PCI && EVENTFD
> depends on MMU
> select VFIO_VIRQFD
> @@ -11,9 +11,17 @@ config VFIO_PCI
>
> If you don't know what to do here, say N.
>
> +config VFIO_PCI
> + tristate "VFIO support for PCI devices"
> + depends on VFIO_PCI_CORE
> + help
> + This provides a generic PCI support using the VFIO framework.
> +
> + If you don't know what to do here, say N.
> +

I think it's going to generate a lot of user and distro frustration to
hide VFIO_PCI behind a new VFIO_PCI_CORE option. The core should be a
dependency *selected* by the drivers, not a prerequisite for the
driver. Thanks,

Alex

2021-06-08 22:47:01

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, Jun 08, 2021 at 03:26:43PM -0600, Alex Williamson wrote:
> > drivers that specifically opt into this feature and the driver now has
> > the opportunity to provide a proper match table that indicates what HW
> > it can properly support. vfio-pci continues to support everything.
>
> In doing so, this also breaks the new_id method for vfio-pci.

Does it? How? The driver_override flag is per match entry not for the
entire device so new_id added things will work the same as before as
their new match entry's flags will be zero.

> Sorry, with so many userspace regressions, crippling the
> driver_override interface with an assumption of such a narrow focus,
> creating a vfio specific match flag, I don't see where this can go.
> Thanks,

On the other hand it overcomes all the objections from the last go
round: how userspace figures out which driver to use with
driver_override and integrating the universal driver into the scheme.

pci_stub could be delt with by marking it for driver_override like
vfio_pci.

But driverctl as a general tool working with any module is not really
addressable.

Is the only issue the blocking of the arbitary binding? That is not a
critical peice of this, IIRC

Jason

2021-06-09 12:17:04

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, 8 Jun 2021 19:45:17 -0300
Jason Gunthorpe <[email protected]> wrote:

> On Tue, Jun 08, 2021 at 03:26:43PM -0600, Alex Williamson wrote:
> > > drivers that specifically opt into this feature and the driver now has
> > > the opportunity to provide a proper match table that indicates what HW
> > > it can properly support. vfio-pci continues to support everything.
> >
> > In doing so, this also breaks the new_id method for vfio-pci.
>
> Does it? How? The driver_override flag is per match entry not for the
> entire device so new_id added things will work the same as before as
> their new match entry's flags will be zero.

Hmm, that might have been a testing issue; combining driverctl with
manual new_id testing might have left a driver_override in place.

> > Sorry, with so many userspace regressions, crippling the
> > driver_override interface with an assumption of such a narrow focus,
> > creating a vfio specific match flag, I don't see where this can go.
> > Thanks,
>
> On the other hand it overcomes all the objections from the last go
> round: how userspace figures out which driver to use with
> driver_override and integrating the universal driver into the scheme.
>
> pci_stub could be delt with by marking it for driver_override like
> vfio_pci.

By marking it a "vfio driver override"? :-\

> But driverctl as a general tool working with any module is not really
> addressable.
>
> Is the only issue the blocking of the arbitary binding? That is not a
> critical peice of this, IIRC

We can't break userspace, which means new_id and driver_override need
to work as they do now. There are scads of driver binding scripts in
the wild, for vfio-pci and other drivers. We can't assume such a
narrow scope. Thanks,

Alex

2021-06-09 14:11:35

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 10/11] vfio-pci: introduce vfio_pci_core subsystem driver


On 6/9/2021 12:26 AM, Alex Williamson wrote:
> On Thu, 3 Jun 2021 19:08:08 +0300
> Max Gurtovoy <[email protected]> wrote:
>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
>> index 5e2e1b9a9fd3..384d06661f30 100644
>> --- a/drivers/vfio/pci/Kconfig
>> +++ b/drivers/vfio/pci/Kconfig
>> @@ -1,6 +1,6 @@
>> # SPDX-License-Identifier: GPL-2.0-only
>> -config VFIO_PCI
>> - tristate "VFIO support for PCI devices"
>> +config VFIO_PCI_CORE
>> + tristate "VFIO core support for PCI devices"
>> depends on VFIO && PCI && EVENTFD
>> depends on MMU
>> select VFIO_VIRQFD
>> @@ -11,9 +11,17 @@ config VFIO_PCI
>>
>> If you don't know what to do here, say N.
>>
>> +config VFIO_PCI
>> + tristate "VFIO support for PCI devices"
>> + depends on VFIO_PCI_CORE
>> + help
>> + This provides a generic PCI support using the VFIO framework.
>> +
>> + If you don't know what to do here, say N.
>> +
> I think it's going to generate a lot of user and distro frustration to
> hide VFIO_PCI behind a new VFIO_PCI_CORE option. The core should be a
> dependency *selected* by the drivers, not a prerequisite for the
> driver. Thanks,

I'll fix that. Thanks.


>
> Alex
>

2021-06-09 17:14:25

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/9/2021 4:27 AM, Alex Williamson wrote:
> On Tue, 8 Jun 2021 19:45:17 -0300
> Jason Gunthorpe <[email protected]> wrote:
>
>> On Tue, Jun 08, 2021 at 03:26:43PM -0600, Alex Williamson wrote:
>>>> drivers that specifically opt into this feature and the driver now has
>>>> the opportunity to provide a proper match table that indicates what HW
>>>> it can properly support. vfio-pci continues to support everything.
>>> In doing so, this also breaks the new_id method for vfio-pci.
>> Does it? How? The driver_override flag is per match entry not for the
>> entire device so new_id added things will work the same as before as
>> their new match entry's flags will be zero.
> Hmm, that might have been a testing issue; combining driverctl with
> manual new_id testing might have left a driver_override in place.
>
>>> Sorry, with so many userspace regressions, crippling the
>>> driver_override interface with an assumption of such a narrow focus,
>>> creating a vfio specific match flag, I don't see where this can go.
>>> Thanks,
>> On the other hand it overcomes all the objections from the last go
>> round: how userspace figures out which driver to use with
>> driver_override and integrating the universal driver into the scheme.
>>
>> pci_stub could be delt with by marking it for driver_override like
>> vfio_pci.
> By marking it a "vfio driver override"? :-\

Of course not. We'll mark it as "stub driver override".

>
>> But driverctl as a general tool working with any module is not really
>> addressable.
>>
>> Is the only issue the blocking of the arbitary binding? That is not a
>> critical peice of this, IIRC
> We can't break userspace, which means new_id and driver_override need
> to work as they do now. There are scads of driver binding scripts in
> the wild, for vfio-pci and other drivers. We can't assume such a
> narrow scope. Thanks,
>
> Alex
>

2021-06-13 08:24:16

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/9/2021 4:27 AM, Alex Williamson wrote:
> On Tue, 8 Jun 2021 19:45:17 -0300
> Jason Gunthorpe <[email protected]> wrote:
>
>> On Tue, Jun 08, 2021 at 03:26:43PM -0600, Alex Williamson wrote:
>>>> drivers that specifically opt into this feature and the driver now has
>>>> the opportunity to provide a proper match table that indicates what HW
>>>> it can properly support. vfio-pci continues to support everything.
>>> In doing so, this also breaks the new_id method for vfio-pci.
>> Does it? How? The driver_override flag is per match entry not for the
>> entire device so new_id added things will work the same as before as
>> their new match entry's flags will be zero.
> Hmm, that might have been a testing issue; combining driverctl with
> manual new_id testing might have left a driver_override in place.
>
>>> Sorry, with so many userspace regressions, crippling the
>>> driver_override interface with an assumption of such a narrow focus,
>>> creating a vfio specific match flag, I don't see where this can go.
>>> Thanks,
>> On the other hand it overcomes all the objections from the last go
>> round: how userspace figures out which driver to use with
>> driver_override and integrating the universal driver into the scheme.
>>
>> pci_stub could be delt with by marking it for driver_override like
>> vfio_pci.
> By marking it a "vfio driver override"? :-\
>
>> But driverctl as a general tool working with any module is not really
>> addressable.
>>
>> Is the only issue the blocking of the arbitary binding? That is not a
>> critical peice of this, IIRC
> We can't break userspace, which means new_id and driver_override need
> to work as they do now. There are scads of driver binding scripts in
> the wild, for vfio-pci and other drivers. We can't assume such a
> narrow scope. Thanks,

what about the following code ?

@@ -152,12 +152,28 @@ static const struct pci_device_id
*pci_match_device(struct pci_driver *drv,
        }
        spin_unlock(&drv->dynids.lock);

-       if (!found_id)
-               found_id = pci_match_id(drv->id_table, dev);
+       if (found_id)
+               return found_id;

-       /* driver_override will always match, send a dummy id */
-       if (!found_id && dev->driver_override)
+       found_id = pci_match_id(drv->id_table, dev);
+       if (found_id) {
+               /*
+                * if we found id in the static table, we must fulfill the
+                * matching flags (i.e. if PCI_ID_F_DRIVER_OVERRIDE flag is
+                * set, driver_override should be provided).
+                */
+               bool is_driver_override =
+                       (found_id->flags & PCI_ID_F_DRIVER_OVERRIDE) != 0;
+               if ((is_driver_override && !dev->driver_override) ||
+                   (dev->driver_override && !is_driver_override))
+                       return NULL;
+       } else if (dev->driver_override) {
+               /*
+                * if we didn't find suitable id in the static table,
+                * driver_override will still , send a dummy id
+                */
                found_id = &pci_device_id_any;
+       }

        return found_id;
 }


dynamic ids (new_id) works as before.

Old driver_override works as before.

For "new" driver_override we must fulfill the new rules.

>
> Alex
>

2021-06-14 05:46:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Sun, Jun 13, 2021 at 11:19:46AM +0300, Max Gurtovoy wrote:
> what about the following code ?
>
> @@ -152,12 +152,28 @@ static const struct pci_device_id
> *pci_match_device(struct pci_driver *drv,
> ?????????????? }
> ?????????????? spin_unlock(&drv->dynids.lock);
>
> -???????????? if (!found_id)
> -???????????????????????????? found_id = pci_match_id(drv->id_table, dev);
> +???????????? if (found_id)
> +???????????????????????????? return found_id;

Something is broken in your mailer because this does not look like code
at all.

2021-06-14 08:21:22

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/14/2021 8:40 AM, Christoph Hellwig wrote:
> On Sun, Jun 13, 2021 at 11:19:46AM +0300, Max Gurtovoy wrote:
>> what about the following code ?
>>
>> @@ -152,12 +152,28 @@ static const struct pci_device_id
>> *pci_match_device(struct pci_driver *drv,
>> ?????????????? }
>> ?????????????? spin_unlock(&drv->dynids.lock);
>>
>> -???????????? if (!found_id)
>> -???????????????????????????? found_id = pci_match_id(drv->id_table, dev);
>> +???????????? if (found_id)
>> +???????????????????????????? return found_id;
> Something is broken in your mailer because this does not look like code
> at all.

Sorry, see the attached patches.



Attachments:
0001-PCI-add-flags-field-to-pci_device_id-structure.patch (6.05 kB)
0002-PCI-add-matching-checks-for-driver_override-binding.patch (5.12 kB)
Download all attachments

2021-06-14 15:31:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Mon, Jun 14, 2021 at 11:18:32AM +0300, Max Gurtovoy wrote:
> * into a static list of equivalent device types,
> * instead of using it as a pointer.
> + * @flags: PCI flags of the driver. Bitmap of pci_id_flags enum.
> */
> struct pci_device_id {
> __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/
> __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */
> __u32 class, class_mask; /* (class,subclass,prog-if) triplet */
> kernel_ulong_t driver_data; /* Data private to the driver */
> + __u32 flags;
> };

Isn't struct pci_device_id a userspace ABI due to MODULE_DEVICE_TABLE()?

2021-06-14 16:03:12

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Mon, Jun 14, 2021 at 04:27:33PM +0100, Christoph Hellwig wrote:
> On Mon, Jun 14, 2021 at 11:18:32AM +0300, Max Gurtovoy wrote:
> > * into a static list of equivalent device types,
> > * instead of using it as a pointer.
> > + * @flags: PCI flags of the driver. Bitmap of pci_id_flags enum.
> > */
> > struct pci_device_id {
> > __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/
> > __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */
> > __u32 class, class_mask; /* (class,subclass,prog-if) triplet */
> > kernel_ulong_t driver_data; /* Data private to the driver */
> > + __u32 flags;
> > };
>
> Isn't struct pci_device_id a userspace ABI due to MODULE_DEVICE_TABLE()?

Not that I can find, it isn't under include/uapi and the way to find
this information is by looking for symbols starting with "__mod_"

Debian Code Search says the only place with '"__mod_"' is in
file2alias.c at least

Do you know of something? If yes this file should be moved

Jason

2021-06-14 16:18:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Mon, Jun 14, 2021 at 01:01:25PM -0300, Jason Gunthorpe wrote:
> > Isn't struct pci_device_id a userspace ABI due to MODULE_DEVICE_TABLE()?
>
> Not that I can find, it isn't under include/uapi and the way to find
> this information is by looking for symbols starting with "__mod_"
>
> Debian Code Search says the only place with '"__mod_"' is in
> file2alias.c at least
>
> Do you know of something? If yes this file should be moved

Seems lke file2alias.c is indeed the only consumer. So it is a
userspace ABI, but ony to a file included in the kernel tree.

2021-06-14 16:35:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Mon, Jun 14, 2021 at 05:15:22PM +0100, Christoph Hellwig wrote:
> On Mon, Jun 14, 2021 at 01:01:25PM -0300, Jason Gunthorpe wrote:
> > > Isn't struct pci_device_id a userspace ABI due to MODULE_DEVICE_TABLE()?
> >
> > Not that I can find, it isn't under include/uapi and the way to find
> > this information is by looking for symbols starting with "__mod_"
> >
> > Debian Code Search says the only place with '"__mod_"' is in
> > file2alias.c at least
> >
> > Do you know of something? If yes this file should be moved
>
> Seems lke file2alias.c is indeed the only consumer. So it is a
> userspace ABI, but ony to a file included in the kernel tree.

As I understand it, things are tighter than that, it is only an API
between different parts of kbuild - so it is OK to change
it. module.alias is the uAPI this data gets marshaled into.

Jason

2021-06-14 18:44:38

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Sun, 13 Jun 2021 11:19:46 +0300
Max Gurtovoy <[email protected]> wrote:

> On 6/9/2021 4:27 AM, Alex Williamson wrote:
> > On Tue, 8 Jun 2021 19:45:17 -0300
> > Jason Gunthorpe <[email protected]> wrote:
> >
> >> On Tue, Jun 08, 2021 at 03:26:43PM -0600, Alex Williamson wrote:
> >>>> drivers that specifically opt into this feature and the driver now has
> >>>> the opportunity to provide a proper match table that indicates what HW
> >>>> it can properly support. vfio-pci continues to support everything.
> >>> In doing so, this also breaks the new_id method for vfio-pci.
> >> Does it? How? The driver_override flag is per match entry not for the
> >> entire device so new_id added things will work the same as before as
> >> their new match entry's flags will be zero.
> > Hmm, that might have been a testing issue; combining driverctl with
> > manual new_id testing might have left a driver_override in place.
> >
> >>> Sorry, with so many userspace regressions, crippling the
> >>> driver_override interface with an assumption of such a narrow focus,
> >>> creating a vfio specific match flag, I don't see where this can go.
> >>> Thanks,
> >> On the other hand it overcomes all the objections from the last go
> >> round: how userspace figures out which driver to use with
> >> driver_override and integrating the universal driver into the scheme.
> >>
> >> pci_stub could be delt with by marking it for driver_override like
> >> vfio_pci.
> > By marking it a "vfio driver override"? :-\
> >
> >> But driverctl as a general tool working with any module is not really
> >> addressable.
> >>
> >> Is the only issue the blocking of the arbitary binding? That is not a
> >> critical peice of this, IIRC
> > We can't break userspace, which means new_id and driver_override need
> > to work as they do now. There are scads of driver binding scripts in
> > the wild, for vfio-pci and other drivers. We can't assume such a
> > narrow scope. Thanks,
>
> what about the following code ?
>
> @@ -152,12 +152,28 @@ static const struct pci_device_id
> *pci_match_device(struct pci_driver *drv,
>         }
>         spin_unlock(&drv->dynids.lock);
>
> -       if (!found_id)
> -               found_id = pci_match_id(drv->id_table, dev);
> +       if (found_id)
> +               return found_id;

a) A dynamic ID match always works regardless of driver override...

>
> -       /* driver_override will always match, send a dummy id */
> -       if (!found_id && dev->driver_override)
> +       found_id = pci_match_id(drv->id_table, dev);
> +       if (found_id) {
> +               /*
> +                * if we found id in the static table, we must fulfill the
> +                * matching flags (i.e. if PCI_ID_F_DRIVER_OVERRIDE flag is
> +                * set, driver_override should be provided).
> +                */
> +               bool is_driver_override =
> +                       (found_id->flags & PCI_ID_F_DRIVER_OVERRIDE) != 0;
> +               if ((is_driver_override && !dev->driver_override) ||

b) A static ID match fails if the driver provides an override flag and
the device does not have an override set, or...

> +                   (dev->driver_override && !is_driver_override))

c) The device has an override set and the driver does not support the
override flag.

> +                       return NULL;
> +       } else if (dev->driver_override) {
> +               /*
> +                * if we didn't find suitable id in the static table,
> +                * driver_override will still , send a dummy id
> +                */
>                 found_id = &pci_device_id_any;
> +       }
>
>         return found_id;
>  }
>
>
> dynamic ids (new_id) works as before.
>
> Old driver_override works as before.

This is deceptively complicated, but no, I don't believe it does. By
my understanding of c) an "old" driver can no longer use
driver_override for binding a known device. It seems that if we have a
static ID match, then we cannot have a driver_override set for the
device in such a case. This is a userspace regression.

> For "new" driver_override we must fulfill the new rules.

For override'able drivers, the static table is almost useless other
than using it for modules.alias support and potentially to provide
driver_data. As above, I find this all pretty confusing and I'd advise
trying to write a concise set of rules outlining the behavior of
driver_override vs dynamic IDs vs static IDs vs "override'able" driver
flags. I tried, I can't, it's convoluted and full of exceptions.
Thanks,

Alex

2021-06-14 23:15:22

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/14/2021 9:42 PM, Alex Williamson wrote:
> On Sun, 13 Jun 2021 11:19:46 +0300
> Max Gurtovoy <[email protected]> wrote:
>
>> On 6/9/2021 4:27 AM, Alex Williamson wrote:
>>> On Tue, 8 Jun 2021 19:45:17 -0300
>>> Jason Gunthorpe <[email protected]> wrote:
>>>
>>>> On Tue, Jun 08, 2021 at 03:26:43PM -0600, Alex Williamson wrote:
>>>>>> drivers that specifically opt into this feature and the driver now has
>>>>>> the opportunity to provide a proper match table that indicates what HW
>>>>>> it can properly support. vfio-pci continues to support everything.
>>>>> In doing so, this also breaks the new_id method for vfio-pci.
>>>> Does it? How? The driver_override flag is per match entry not for the
>>>> entire device so new_id added things will work the same as before as
>>>> their new match entry's flags will be zero.
>>> Hmm, that might have been a testing issue; combining driverctl with
>>> manual new_id testing might have left a driver_override in place.
>>>
>>>>> Sorry, with so many userspace regressions, crippling the
>>>>> driver_override interface with an assumption of such a narrow focus,
>>>>> creating a vfio specific match flag, I don't see where this can go.
>>>>> Thanks,
>>>> On the other hand it overcomes all the objections from the last go
>>>> round: how userspace figures out which driver to use with
>>>> driver_override and integrating the universal driver into the scheme.
>>>>
>>>> pci_stub could be delt with by marking it for driver_override like
>>>> vfio_pci.
>>> By marking it a "vfio driver override"? :-\
>>>
>>>> But driverctl as a general tool working with any module is not really
>>>> addressable.
>>>>
>>>> Is the only issue the blocking of the arbitary binding? That is not a
>>>> critical peice of this, IIRC
>>> We can't break userspace, which means new_id and driver_override need
>>> to work as they do now. There are scads of driver binding scripts in
>>> the wild, for vfio-pci and other drivers. We can't assume such a
>>> narrow scope. Thanks,
>> what about the following code ?
>>
>> @@ -152,12 +152,28 @@ static const struct pci_device_id
>> *pci_match_device(struct pci_driver *drv,
>>         }
>>         spin_unlock(&drv->dynids.lock);
>>
>> -       if (!found_id)
>> -               found_id = pci_match_id(drv->id_table, dev);
>> +       if (found_id)
>> +               return found_id;
> a) A dynamic ID match always works regardless of driver override...
>
>> -       /* driver_override will always match, send a dummy id */
>> -       if (!found_id && dev->driver_override)
>> +       found_id = pci_match_id(drv->id_table, dev);
>> +       if (found_id) {
>> +               /*
>> +                * if we found id in the static table, we must fulfill the
>> +                * matching flags (i.e. if PCI_ID_F_DRIVER_OVERRIDE flag is
>> +                * set, driver_override should be provided).
>> +                */
>> +               bool is_driver_override =
>> +                       (found_id->flags & PCI_ID_F_DRIVER_OVERRIDE) != 0;
>> +               if ((is_driver_override && !dev->driver_override) ||
> b) A static ID match fails if the driver provides an override flag and
> the device does not have an override set, or...
>
>> +                   (dev->driver_override && !is_driver_override))
> c) The device has an override set and the driver does not support the
> override flag.
>
>> +                       return NULL;
>> +       } else if (dev->driver_override) {
>> +               /*
>> +                * if we didn't find suitable id in the static table,
>> +                * driver_override will still , send a dummy id
>> +                */
>>                 found_id = &pci_device_id_any;
>> +       }
>>
>>         return found_id;
>>  }
>>
>>
>> dynamic ids (new_id) works as before.
>>
>> Old driver_override works as before.
> This is deceptively complicated, but no, I don't believe it does. By
> my understanding of c) an "old" driver can no longer use
> driver_override for binding a known device. It seems that if we have a
> static ID match, then we cannot have a driver_override set for the
> device in such a case. This is a userspace regression.

If I'll remove condition c) everyone will be happy ?

I really would like to end this ongoing discussion and finally have a
clear idea of what we want.

By clear I mean C code.

If we'll continue raising ideas we'll never reach our goal. And my goal
is the next merge window.

>
>> For "new" driver_override we must fulfill the new rules.
> For override'able drivers, the static table is almost useless other
> than using it for modules.alias support and potentially to provide
> driver_data. As above, I find this all pretty confusing and I'd advise
> trying to write a concise set of rules outlining the behavior of
> driver_override vs dynamic IDs vs static IDs vs "override'able" driver
> flags. I tried, I can't, it's convoluted and full of exceptions.
> Thanks,
>
> Alex
>

2021-06-15 15:01:30

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, 15 Jun 2021 02:12:15 +0300
Max Gurtovoy <[email protected]> wrote:

> On 6/14/2021 9:42 PM, Alex Williamson wrote:
> > On Sun, 13 Jun 2021 11:19:46 +0300
> > Max Gurtovoy <[email protected]> wrote:
> >
> >> On 6/9/2021 4:27 AM, Alex Williamson wrote:
> >>> On Tue, 8 Jun 2021 19:45:17 -0300
> >>> Jason Gunthorpe <[email protected]> wrote:
> >>>
> >>>> On Tue, Jun 08, 2021 at 03:26:43PM -0600, Alex Williamson wrote:
> >>>>>> drivers that specifically opt into this feature and the driver now has
> >>>>>> the opportunity to provide a proper match table that indicates what HW
> >>>>>> it can properly support. vfio-pci continues to support everything.
> >>>>> In doing so, this also breaks the new_id method for vfio-pci.
> >>>> Does it? How? The driver_override flag is per match entry not for the
> >>>> entire device so new_id added things will work the same as before as
> >>>> their new match entry's flags will be zero.
> >>> Hmm, that might have been a testing issue; combining driverctl with
> >>> manual new_id testing might have left a driver_override in place.
> >>>
> >>>>> Sorry, with so many userspace regressions, crippling the
> >>>>> driver_override interface with an assumption of such a narrow focus,
> >>>>> creating a vfio specific match flag, I don't see where this can go.
> >>>>> Thanks,
> >>>> On the other hand it overcomes all the objections from the last go
> >>>> round: how userspace figures out which driver to use with
> >>>> driver_override and integrating the universal driver into the scheme.
> >>>>
> >>>> pci_stub could be delt with by marking it for driver_override like
> >>>> vfio_pci.
> >>> By marking it a "vfio driver override"? :-\
> >>>
> >>>> But driverctl as a general tool working with any module is not really
> >>>> addressable.
> >>>>
> >>>> Is the only issue the blocking of the arbitary binding? That is not a
> >>>> critical peice of this, IIRC
> >>> We can't break userspace, which means new_id and driver_override need
> >>> to work as they do now. There are scads of driver binding scripts in
> >>> the wild, for vfio-pci and other drivers. We can't assume such a
> >>> narrow scope. Thanks,
> >> what about the following code ?
> >>
> >> @@ -152,12 +152,28 @@ static const struct pci_device_id
> >> *pci_match_device(struct pci_driver *drv,
> >>         }
> >>         spin_unlock(&drv->dynids.lock);
> >>
> >> -       if (!found_id)
> >> -               found_id = pci_match_id(drv->id_table, dev);
> >> +       if (found_id)
> >> +               return found_id;
> > a) A dynamic ID match always works regardless of driver override...
> >
> >> -       /* driver_override will always match, send a dummy id */
> >> -       if (!found_id && dev->driver_override)
> >> +       found_id = pci_match_id(drv->id_table, dev);
> >> +       if (found_id) {
> >> +               /*
> >> +                * if we found id in the static table, we must fulfill the
> >> +                * matching flags (i.e. if PCI_ID_F_DRIVER_OVERRIDE flag is
> >> +                * set, driver_override should be provided).
> >> +                */
> >> +               bool is_driver_override =
> >> +                       (found_id->flags & PCI_ID_F_DRIVER_OVERRIDE) != 0;
> >> +               if ((is_driver_override && !dev->driver_override) ||
> > b) A static ID match fails if the driver provides an override flag and
> > the device does not have an override set, or...
> >
> >> +                   (dev->driver_override && !is_driver_override))
> > c) The device has an override set and the driver does not support the
> > override flag.
> >
> >> +                       return NULL;
> >> +       } else if (dev->driver_override) {
> >> +               /*
> >> +                * if we didn't find suitable id in the static table,
> >> +                * driver_override will still , send a dummy id
> >> +                */
> >>                 found_id = &pci_device_id_any;
> >> +       }
> >>
> >>         return found_id;
> >>  }
> >>
> >>
> >> dynamic ids (new_id) works as before.
> >>
> >> Old driver_override works as before.
> > This is deceptively complicated, but no, I don't believe it does. By
> > my understanding of c) an "old" driver can no longer use
> > driver_override for binding a known device. It seems that if we have a
> > static ID match, then we cannot have a driver_override set for the
> > device in such a case. This is a userspace regression.
>
> If I'll remove condition c) everyone will be happy ?
>
> I really would like to end this ongoing discussion and finally have a
> clear idea of what we want.
>
> By clear I mean C code.
>
> If we'll continue raising ideas we'll never reach our goal. And my goal
> is the next merge window.

Bjorn would ultimately need to make the call on that, I don't see an
obvious regression if c) is dropped. pci-stub and pci-pf-stub should
be included in the proposal so we can better understand how creating a
"vfio" override in PCI-core plays out for other override types. Also I
don't think dynamic IDs should be handled uniquely, new_id_store()
should gain support for flags and userspace should be able to add new
dynamic ID with override-only matches to the table. Thanks,

Alex

2021-06-15 15:08:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, Jun 15, 2021 at 09:00:29AM -0600, Alex Williamson wrote:

> "vfio" override in PCI-core plays out for other override types. Also I
> don't think dynamic IDs should be handled uniquely, new_id_store()
> should gain support for flags and userspace should be able to add new
> dynamic ID with override-only matches to the table. Thanks,

Why? Once all the enforcement is stripped out the only purpose of the
new flag is to signal a different prepration of modules.alias - which
won't happen for the new_id path anyhow

Jason

2021-06-15 16:24:31

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, 15 Jun 2021 12:04:58 -0300
Jason Gunthorpe <[email protected]> wrote:

> On Tue, Jun 15, 2021 at 09:00:29AM -0600, Alex Williamson wrote:
>
> > "vfio" override in PCI-core plays out for other override types. Also I
> > don't think dynamic IDs should be handled uniquely, new_id_store()
> > should gain support for flags and userspace should be able to add new
> > dynamic ID with override-only matches to the table. Thanks,
>
> Why? Once all the enforcement is stripped out the only purpose of the
> new flag is to signal a different prepration of modules.alias - which
> won't happen for the new_id path anyhow

Because new_id allows the admin to insert a new pci_device_id which has
been extended to include a flags field and intentionally handling
dynamic IDs differently from static IDs seems like generally a bad
thing. For example, maybe the admin wants to specify nouveau as only
an override match for all 10de: class vga devices. Thanks,

Alex

2021-06-15 20:44:04

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, Jun 15, 2021 at 10:20:49AM -0600, Alex Williamson wrote:
> On Tue, 15 Jun 2021 12:04:58 -0300
> Jason Gunthorpe <[email protected]> wrote:
>
> > On Tue, Jun 15, 2021 at 09:00:29AM -0600, Alex Williamson wrote:
> >
> > > "vfio" override in PCI-core plays out for other override types. Also I
> > > don't think dynamic IDs should be handled uniquely, new_id_store()
> > > should gain support for flags and userspace should be able to add new
> > > dynamic ID with override-only matches to the table. Thanks,
> >
> > Why? Once all the enforcement is stripped out the only purpose of the
> > new flag is to signal a different prepration of modules.alias - which
> > won't happen for the new_id path anyhow
>
> Because new_id allows the admin to insert a new pci_device_id which has
> been extended to include a flags field and intentionally handling
> dynamic IDs differently from static IDs seems like generally a bad
> thing.

I'd agree with you if there was a functional difference at runtime,
but since that was all removed, I don't think we should touch new_id.

This ends up effectively being only a kbuild related patch that
changes how modules.alias is built.

Jason

2021-06-15 21:59:54

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, 15 Jun 2021 17:42:16 -0300
Jason Gunthorpe <[email protected]> wrote:

> On Tue, Jun 15, 2021 at 10:20:49AM -0600, Alex Williamson wrote:
> > On Tue, 15 Jun 2021 12:04:58 -0300
> > Jason Gunthorpe <[email protected]> wrote:
> >
> > > On Tue, Jun 15, 2021 at 09:00:29AM -0600, Alex Williamson wrote:
> > >
> > > > "vfio" override in PCI-core plays out for other override types. Also I
> > > > don't think dynamic IDs should be handled uniquely, new_id_store()
> > > > should gain support for flags and userspace should be able to add new
> > > > dynamic ID with override-only matches to the table. Thanks,
> > >
> > > Why? Once all the enforcement is stripped out the only purpose of the
> > > new flag is to signal a different prepration of modules.alias - which
> > > won't happen for the new_id path anyhow
> >
> > Because new_id allows the admin to insert a new pci_device_id which has
> > been extended to include a flags field and intentionally handling
> > dynamic IDs differently from static IDs seems like generally a bad
> > thing.
>
> I'd agree with you if there was a functional difference at runtime,
> but since that was all removed, I don't think we should touch new_id.
>
> This ends up effectively being only a kbuild related patch that
> changes how modules.alias is built.

But it wasn't all removed. The proposal had:

a) Short circuit the dynamic ID match
b) Fail a driver-override-only match without a driver_override
c) Fail a non-driver-override-only match with a driver_override

Max is only proposing removing c).

b) alone is a functional, runtime difference. As per my previous
example, think about using it as effectively an anti-match. Userspace
can create dynamic entries that will be matched before static entries
that can demote a driver to not be able to bind to a device
automatically. That's functional and useful, I can't think of any
other way we have to effectively remove a static match entry from a
driver. Thanks,

Alex

2021-06-15 23:03:44

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, Jun 15, 2021 at 03:59:00PM -0600, Alex Williamson wrote:
> On Tue, 15 Jun 2021 17:42:16 -0300
> Jason Gunthorpe <[email protected]> wrote:
>
> > On Tue, Jun 15, 2021 at 10:20:49AM -0600, Alex Williamson wrote:
> > > On Tue, 15 Jun 2021 12:04:58 -0300
> > > Jason Gunthorpe <[email protected]> wrote:
> > >
> > > > On Tue, Jun 15, 2021 at 09:00:29AM -0600, Alex Williamson wrote:
> > > >
> > > > > "vfio" override in PCI-core plays out for other override types. Also I
> > > > > don't think dynamic IDs should be handled uniquely, new_id_store()
> > > > > should gain support for flags and userspace should be able to add new
> > > > > dynamic ID with override-only matches to the table. Thanks,
> > > >
> > > > Why? Once all the enforcement is stripped out the only purpose of the
> > > > new flag is to signal a different prepration of modules.alias - which
> > > > won't happen for the new_id path anyhow
> > >
> > > Because new_id allows the admin to insert a new pci_device_id which has
> > > been extended to include a flags field and intentionally handling
> > > dynamic IDs differently from static IDs seems like generally a bad
> > > thing.
> >
> > I'd agree with you if there was a functional difference at runtime,
> > but since that was all removed, I don't think we should touch new_id.
> >
> > This ends up effectively being only a kbuild related patch that
> > changes how modules.alias is built.
>
> But it wasn't all removed. The proposal had:
>
> a) Short circuit the dynamic ID match
> b) Fail a driver-override-only match without a driver_override
> c) Fail a non-driver-override-only match with a driver_override
>
> Max is only proposing removing c).
>
> b) alone is a functional, runtime difference.

I would state b) differently:

b) Ignore the driver-override-only match entries in the ID table.

As if we look at new_id, I can't think of any reason for userspace to
add an entry to the ID table and then tell the kernel to ignore
it. If you want the kernel to ignore it then just don't add it in the
first place.

Do you have some other scenario in mind?

Jason

2021-06-15 23:24:48

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, 15 Jun 2021 20:00:17 -0300
Jason Gunthorpe <[email protected]> wrote:

> On Tue, Jun 15, 2021 at 03:59:00PM -0600, Alex Williamson wrote:
> > On Tue, 15 Jun 2021 17:42:16 -0300
> > Jason Gunthorpe <[email protected]> wrote:
> >
> > > On Tue, Jun 15, 2021 at 10:20:49AM -0600, Alex Williamson wrote:
> > > > On Tue, 15 Jun 2021 12:04:58 -0300
> > > > Jason Gunthorpe <[email protected]> wrote:
> > > >
> > > > > On Tue, Jun 15, 2021 at 09:00:29AM -0600, Alex Williamson wrote:
> > > > >
> > > > > > "vfio" override in PCI-core plays out for other override types. Also I
> > > > > > don't think dynamic IDs should be handled uniquely, new_id_store()
> > > > > > should gain support for flags and userspace should be able to add new
> > > > > > dynamic ID with override-only matches to the table. Thanks,
> > > > >
> > > > > Why? Once all the enforcement is stripped out the only purpose of the
> > > > > new flag is to signal a different prepration of modules.alias - which
> > > > > won't happen for the new_id path anyhow
> > > >
> > > > Because new_id allows the admin to insert a new pci_device_id which has
> > > > been extended to include a flags field and intentionally handling
> > > > dynamic IDs differently from static IDs seems like generally a bad
> > > > thing.
> > >
> > > I'd agree with you if there was a functional difference at runtime,
> > > but since that was all removed, I don't think we should touch new_id.
> > >
> > > This ends up effectively being only a kbuild related patch that
> > > changes how modules.alias is built.
> >
> > But it wasn't all removed. The proposal had:
> >
> > a) Short circuit the dynamic ID match
> > b) Fail a driver-override-only match without a driver_override
> > c) Fail a non-driver-override-only match with a driver_override
> >
> > Max is only proposing removing c).
> >
> > b) alone is a functional, runtime difference.
>
> I would state b) differently:
>
> b) Ignore the driver-override-only match entries in the ID table.

No, pci_match_device() returns NULL if a match is found that is marked
driver-override-only and a driver_override is not specified. That's
the same as no match at all. We don't then go on to search past that
match in the table, we fail to bind the driver. That's effectively an
anti-match when there's no driver_override on the device.

> As if we look at new_id, I can't think of any reason for userspace to
> add an entry to the ID table and then tell the kernel to ignore
> it. If you want the kernel to ignore it then just don't add it in the
> first place.
>
> Do you have some other scenario in mind?

Sure, what if I have two different GPUs in my system, one works fine
with the FOSS driver, the other requires a 3rd party driver. I don't
want to blacklist the FOSS driver, but I don't want it to claim the
other GPU. I can create an anti-match that effectively removes one GPU
from the FOSS driver unless it's bound with a driver_override.

I understand that's not your intended use case, but I think this allows
that and justifies handling a dynamic ID the same as a static ID.
Adding a field to pci_device_id, which is otherwise able to be fully
specified via new_id, except for this field, feels like a bug. Thanks,

Alex

2021-06-15 23:33:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, Jun 15, 2021 at 05:22:42PM -0600, Alex Williamson wrote:

> > > b) alone is a functional, runtime difference.
> >
> > I would state b) differently:
> >
> > b) Ignore the driver-override-only match entries in the ID table.
>
> No, pci_match_device() returns NULL if a match is found that is marked
> driver-override-only and a driver_override is not specified. That's
> the same as no match at all. We don't then go on to search past that
> match in the table, we fail to bind the driver. That's effectively an
> anti-match when there's no driver_override on the device.

anti-match isn't the intention. The deployment will have match tables
where all entires are either flags=0 or are driver-override-only.

I would say that mixed match tables make driver-override-only into an
anti-match is actually a minor bug in the patch.

The series isn't about adding some new anti-match scheme.

> I understand that's not your intended use case, but I think this allows
> that and justifies handling a dynamic ID the same as a static ID.
> Adding a field to pci_device_id, which is otherwise able to be fully
> specified via new_id, except for this field, feels like a bug. Thanks,

Okay, I see what you are saying clearly now.

Your example usage seems legit to me, but I really don't want to
entangle it with this series. It is a seperate idea, it can go as a
seperate work that uses the new flags and an updated new_id and
related parts by someone who wants it.

I hope you'll understand that having NVIDIA Mellanox persue what you
describe above is just not going to work..

Jason

2021-06-16 00:24:07

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, 15 Jun 2021 20:32:57 -0300
Jason Gunthorpe <[email protected]> wrote:

> On Tue, Jun 15, 2021 at 05:22:42PM -0600, Alex Williamson wrote:
>
> > > > b) alone is a functional, runtime difference.
> > >
> > > I would state b) differently:
> > >
> > > b) Ignore the driver-override-only match entries in the ID table.
> >
> > No, pci_match_device() returns NULL if a match is found that is marked
> > driver-override-only and a driver_override is not specified. That's
> > the same as no match at all. We don't then go on to search past that
> > match in the table, we fail to bind the driver. That's effectively an
> > anti-match when there's no driver_override on the device.
>
> anti-match isn't the intention. The deployment will have match tables
> where all entires are either flags=0 or are driver-override-only.

I'd expect pci-pf-stub to have one of each, an any-id with
override-only flag and the one device ID currently in the table with
no flag.

> I would say that mixed match tables make driver-override-only into an
> anti-match is actually a minor bug in the patch.
>
> The series isn't about adding some new anti-match scheme.
>
> > I understand that's not your intended use case, but I think this allows
> > that and justifies handling a dynamic ID the same as a static ID.
> > Adding a field to pci_device_id, which is otherwise able to be fully
> > specified via new_id, except for this field, feels like a bug. Thanks,
>
> Okay, I see what you are saying clearly now.
>
> Your example usage seems legit to me, but I really don't want to
> entangle it with this series. It is a seperate idea, it can go as a
> seperate work that uses the new flags and an updated new_id and
> related parts by someone who wants it.
>
> I hope you'll understand that having NVIDIA Mellanox persue what you
> describe above is just not going to work..

I understand that use case might hit a nerve, but I don't particularly
see why handling static and dynamic IDs consistently wrt to this new
flags field is controversial. Thanks,

Alex

2021-06-16 00:38:08

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Tue, Jun 15, 2021 at 06:22:45PM -0600, Alex Williamson wrote:
> On Tue, 15 Jun 2021 20:32:57 -0300
> Jason Gunthorpe <[email protected]> wrote:
>
> > On Tue, Jun 15, 2021 at 05:22:42PM -0600, Alex Williamson wrote:
> >
> > > > > b) alone is a functional, runtime difference.
> > > >
> > > > I would state b) differently:
> > > >
> > > > b) Ignore the driver-override-only match entries in the ID table.
> > >
> > > No, pci_match_device() returns NULL if a match is found that is marked
> > > driver-override-only and a driver_override is not specified. That's
> > > the same as no match at all. We don't then go on to search past that
> > > match in the table, we fail to bind the driver. That's effectively an
> > > anti-match when there's no driver_override on the device.
> >
> > anti-match isn't the intention. The deployment will have match tables
> > where all entires are either flags=0 or are driver-override-only.
>
> I'd expect pci-pf-stub to have one of each, an any-id with
> override-only flag and the one device ID currently in the table with
> no flag.

Oh Hum. Actually I think this shows the anti-match behavior is
actually a bug.. :(

For something like pci_pf_stub_whitelist, if we add a
driver_override-only using the PCI any id then it effectively disables
new_id completely because the match search will alway find the
driver_override match first and stop searching. There is no chance to
see things new_id adds.

We have to fix this patch so flags isn't an anti-match to make it work
without user regression.

Jason

2021-06-17 02:37:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Thu, Jun 17, 2021 at 02:28:36AM +0300, Max Gurtovoy wrote:
>
> On 6/16/2021 3:34 AM, Jason Gunthorpe wrote:
> > On Tue, Jun 15, 2021 at 06:22:45PM -0600, Alex Williamson wrote:
> > > On Tue, 15 Jun 2021 20:32:57 -0300
> > > Jason Gunthorpe <[email protected]> wrote:
> > >
> > > > On Tue, Jun 15, 2021 at 05:22:42PM -0600, Alex Williamson wrote:
> > > >
> > > > > > > b) alone is a functional, runtime difference.
> > > > > > I would state b) differently:
> > > > > >
> > > > > > b) Ignore the driver-override-only match entries in the ID table.
> > > > > No, pci_match_device() returns NULL if a match is found that is marked
> > > > > driver-override-only and a driver_override is not specified. That's
> > > > > the same as no match at all. We don't then go on to search past that
> > > > > match in the table, we fail to bind the driver. That's effectively an
> > > > > anti-match when there's no driver_override on the device.
> > > > anti-match isn't the intention. The deployment will have match tables
> > > > where all entires are either flags=0 or are driver-override-only.
> > > I'd expect pci-pf-stub to have one of each, an any-id with
> > > override-only flag and the one device ID currently in the table with
> > > no flag.
> > Oh Hum. Actually I think this shows the anti-match behavior is
> > actually a bug.. :(
> >
> > For something like pci_pf_stub_whitelist, if we add a
> > driver_override-only using the PCI any id then it effectively disables
> > new_id completely because the match search will alway find the
> > driver_override match first and stop searching. There is no chance to
> > see things new_id adds.
>
> Actually the dynamic table is the first table the driver search. So new_id
> works exactly the same AFAIU.

Oh, even better, so it isn't really an issue

> But you're right for static mixed table (I assumed that this will never
> happen I guess).

Me too, we could organize the driver-overrides to be last

> - found_id = pci_match_id(drv->id_table, dev);
> - if (found_id) {
> + ids = drv->id_table;
> + while ((found_id = pci_match_id(ids, dev))) {

Yeah, keep searching makes logical sense to me

> diff --git a/drivers/pci/pci-pf-stub.c b/drivers/pci/pci-pf-stub.c
> index 45855a5e9fca..49544ba9a7af 100644
> +++ b/drivers/pci/pci-pf-stub.c
> @@ -19,6 +19,7 @@
> */
> static const struct pci_device_id pci_pf_stub_whitelist[] = {
> { PCI_VDEVICE(AMAZON, 0x0053) },
> + { PCI_DEVICE_FLAGS(PCI_ANY_ID, PCI_ANY_ID,
> PCI_ID_F_STUB_DRIVER_OVERRIDE) }, /* match all by default (override) */
> /* required last entry */
> { 0 }

And we don't really want this change any more right? No reason to put
pci_stub in the module.alias file?

Thanks,
Jason

2021-06-17 02:38:06

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Thu, Jun 17, 2021 at 02:42:46AM +0300, Max Gurtovoy wrote:

> Do you see a reason not adding this alias for stub drivers but
> adding it to vfio_pci drivers ?

It creates uABI without a userspace user and that is strongly
discouraged. The 'stub_pci:' prefix becomes fixed ABI.

Jason

2021-06-17 02:38:35

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/16/2021 3:34 AM, Jason Gunthorpe wrote:
> On Tue, Jun 15, 2021 at 06:22:45PM -0600, Alex Williamson wrote:
>> On Tue, 15 Jun 2021 20:32:57 -0300
>> Jason Gunthorpe <[email protected]> wrote:
>>
>>> On Tue, Jun 15, 2021 at 05:22:42PM -0600, Alex Williamson wrote:
>>>
>>>>>> b) alone is a functional, runtime difference.
>>>>> I would state b) differently:
>>>>>
>>>>> b) Ignore the driver-override-only match entries in the ID table.
>>>> No, pci_match_device() returns NULL if a match is found that is marked
>>>> driver-override-only and a driver_override is not specified. That's
>>>> the same as no match at all. We don't then go on to search past that
>>>> match in the table, we fail to bind the driver. That's effectively an
>>>> anti-match when there's no driver_override on the device.
>>> anti-match isn't the intention. The deployment will have match tables
>>> where all entires are either flags=0 or are driver-override-only.
>> I'd expect pci-pf-stub to have one of each, an any-id with
>> override-only flag and the one device ID currently in the table with
>> no flag.
> Oh Hum. Actually I think this shows the anti-match behavior is
> actually a bug.. :(
>
> For something like pci_pf_stub_whitelist, if we add a
> driver_override-only using the PCI any id then it effectively disables
> new_id completely because the match search will alway find the
> driver_override match first and stop searching. There is no chance to
> see things new_id adds.

Actually the dynamic table is the first table the driver search. So
new_id works exactly the same AFAIU.

But you're right for static mixed table (I assumed that this will never
happen I guess).

If we put the any_id_override id before the non_override AMAZON device
entry in the pci-pf-stub we'll fail with the matching to the AMAZON device.

What about the bellow untested addition (also remove condition c):

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 296de7bc9dc9..2d46f6cd96f7 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -136,7 +136,7 @@ static const struct pci_device_id
*pci_match_device(struct pci_driver *drv,
                                                    struct pci_dev *dev)
 {
        struct pci_dynid *dynid;
-       const struct pci_device_id *found_id = NULL;
+       const struct pci_device_id *found_id = NULL, *ids;

        /* When driver_override is set, only bind to the matching driver */
        if (dev->driver_override && strcmp(dev->driver_override,
drv->name))
@@ -155,8 +155,8 @@ static const struct pci_device_id
*pci_match_device(struct pci_driver *drv,
        if (found_id)
                return found_id;

-       found_id = pci_match_id(drv->id_table, dev);
-       if (found_id) {
+       ids = drv->id_table;
+       while ((found_id = pci_match_id(ids, dev))) {
                /*
                 * if we found id in the static table, we must fulfill the
                 * matching flags (i.e. if PCI_ID_F_DRIVER_OVERRIDE flag is
@@ -164,17 +164,19 @@ static const struct pci_device_id
*pci_match_device(struct pci_driver *drv,
                 */
                bool is_driver_override =
                        (found_id->flags & PCI_ID_F_DRIVER_OVERRIDE) != 0;
-               if ((is_driver_override && !dev->driver_override) ||
-                   (dev->driver_override && !is_driver_override))
-                       return NULL;
-       } else if (dev->driver_override) {
-               /*
-                * if we didn't find suitable id in the static table,
-                * driver_override will still , send a dummy id
-                */
-               found_id = &pci_device_id_any;
+               if (is_driver_override && !dev->driver_override)
+                       ids = found_id++; /* continue searching */
+               else
+                       break;
        }

+       /*
+        * if no static match, driver_override will always match, send a
dummy
+        * id.
+        */
+       if (!found_id && dev->driver_override)
+               found_id = &pci_device_id_any;
+
        return found_id;
 }

diff --git a/drivers/pci/pci-pf-stub.c b/drivers/pci/pci-pf-stub.c
index 45855a5e9fca..49544ba9a7af 100644
--- a/drivers/pci/pci-pf-stub.c
+++ b/drivers/pci/pci-pf-stub.c
@@ -19,6 +19,7 @@
  */
 static const struct pci_device_id pci_pf_stub_whitelist[] = {
        { PCI_VDEVICE(AMAZON, 0x0053) },
+       { PCI_DEVICE_FLAGS(PCI_ANY_ID, PCI_ANY_ID,
PCI_ID_F_STUB_DRIVER_OVERRIDE) }, /* match all by default (override) */
        /* required last entry */
        { 0 }
 };


>
> We have to fix this patch so flags isn't an anti-match to make it work
> without user regression.
>
> Jason

2021-06-17 02:39:42

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/17/2021 2:44 AM, Jason Gunthorpe wrote:
> On Thu, Jun 17, 2021 at 02:42:46AM +0300, Max Gurtovoy wrote:
>
>> Do you see a reason not adding this alias for stub drivers but
>> adding it to vfio_pci drivers ?
> It creates uABI without a userspace user and that is strongly
> discouraged. The 'stub_pci:' prefix becomes fixed ABI.
so is it better to have "pci:v*d*sv*sd*bc*sc*i*" for stub drivers ? or
not adding alias at all if stub flag is set ?
>
> Jason

2021-06-17 02:39:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding

On Thu, Jun 17, 2021 at 02:51:09AM +0300, Max Gurtovoy wrote:
>
> On 6/17/2021 2:44 AM, Jason Gunthorpe wrote:
> > On Thu, Jun 17, 2021 at 02:42:46AM +0300, Max Gurtovoy wrote:
> >
> > > Do you see a reason not adding this alias for stub drivers but
> > > adding it to vfio_pci drivers ?
> > It creates uABI without a userspace user and that is strongly
> > discouraged. The 'stub_pci:' prefix becomes fixed ABI.
> so is it better to have "pci:v*d*sv*sd*bc*sc*i*" for stub drivers ?

No, we don't want to convey any new information about stub drivers to
userspace.

> or not adding alias at all if stub flag is set ?

Yes, just don't change it at all, IMHO.

Jason

2021-06-17 02:40:24

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/17/2021 2:33 AM, Jason Gunthorpe wrote:
> On Thu, Jun 17, 2021 at 02:28:36AM +0300, Max Gurtovoy wrote:
>> On 6/16/2021 3:34 AM, Jason Gunthorpe wrote:
>>> On Tue, Jun 15, 2021 at 06:22:45PM -0600, Alex Williamson wrote:
>>>> On Tue, 15 Jun 2021 20:32:57 -0300
>>>> Jason Gunthorpe <[email protected]> wrote:
>>>>
>>>>> On Tue, Jun 15, 2021 at 05:22:42PM -0600, Alex Williamson wrote:
>>>>>
>>>>>>>> b) alone is a functional, runtime difference.
>>>>>>> I would state b) differently:
>>>>>>>
>>>>>>> b) Ignore the driver-override-only match entries in the ID table.
>>>>>> No, pci_match_device() returns NULL if a match is found that is marked
>>>>>> driver-override-only and a driver_override is not specified. That's
>>>>>> the same as no match at all. We don't then go on to search past that
>>>>>> match in the table, we fail to bind the driver. That's effectively an
>>>>>> anti-match when there's no driver_override on the device.
>>>>> anti-match isn't the intention. The deployment will have match tables
>>>>> where all entires are either flags=0 or are driver-override-only.
>>>> I'd expect pci-pf-stub to have one of each, an any-id with
>>>> override-only flag and the one device ID currently in the table with
>>>> no flag.
>>> Oh Hum. Actually I think this shows the anti-match behavior is
>>> actually a bug.. :(
>>>
>>> For something like pci_pf_stub_whitelist, if we add a
>>> driver_override-only using the PCI any id then it effectively disables
>>> new_id completely because the match search will alway find the
>>> driver_override match first and stop searching. There is no chance to
>>> see things new_id adds.
>> Actually the dynamic table is the first table the driver search. So new_id
>> works exactly the same AFAIU.
> Oh, even better, so it isn't really an issue
>
>> But you're right for static mixed table (I assumed that this will never
>> happen I guess).
> Me too, we could organize the driver-overrides to be last

Yes we could, but in 2 years from now I'll forget this rule :)

And others may not be aware of it.

>
>> - found_id = pci_match_id(drv->id_table, dev);
>> - if (found_id) {
>> + ids = drv->id_table;
>> + while ((found_id = pci_match_id(ids, dev))) {
> Yeah, keep searching makes logical sense to me
>
>> diff --git a/drivers/pci/pci-pf-stub.c b/drivers/pci/pci-pf-stub.c
>> index 45855a5e9fca..49544ba9a7af 100644
>> +++ b/drivers/pci/pci-pf-stub.c
>> @@ -19,6 +19,7 @@
>> */
>> static const struct pci_device_id pci_pf_stub_whitelist[] = {
>> { PCI_VDEVICE(AMAZON, 0x0053) },
>> + { PCI_DEVICE_FLAGS(PCI_ANY_ID, PCI_ANY_ID,
>> PCI_ID_F_STUB_DRIVER_OVERRIDE) }, /* match all by default (override) */
>> /* required last entry */
>> { 0 }
> And we don't really want this change any more right? No reason to put
> pci_stub in the module.alias file?

I actually did it in the patches I attached earlier.

It will look like:

stub_pci:v*d*sv*sd*bc*sc*i*

pci:v00001D0Fd00000053sv*sd*bc*sc*i*

I think it's good practice to avoid matching automatically and auto
loading any_id_override and vfio_override drivers in general.

Do you see a reason not adding this alias for stub drivers but adding it
to vfio_pci drivers ?

>
> Thanks,
> Jason

2021-06-20 14:52:04

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 09/11] PCI: add matching checks for driver_override binding


On 6/17/2021 2:56 AM, Jason Gunthorpe wrote:
> On Thu, Jun 17, 2021 at 02:51:09AM +0300, Max Gurtovoy wrote:
>> On 6/17/2021 2:44 AM, Jason Gunthorpe wrote:
>>> On Thu, Jun 17, 2021 at 02:42:46AM +0300, Max Gurtovoy wrote:
>>>
>>>> Do you see a reason not adding this alias for stub drivers but
>>>> adding it to vfio_pci drivers ?
>>> It creates uABI without a userspace user and that is strongly
>>> discouraged. The 'stub_pci:' prefix becomes fixed ABI.
>> so is it better to have "pci:v*d*sv*sd*bc*sc*i*" for stub drivers ?
> No, we don't want to convey any new information about stub drivers to
> userspace.
>
>> or not adding alias at all if stub flag is set ?
> Yes, just don't change it at all, IMHO.

Ok.

I've attached 2 patches that I would like to agree on before we'll send
the V5.

They include the pci-pf-stub additions and keep searching for matching
static ids as we discussed.

Max.

>
> Jason


Attachments:
0001-PCI-add-flags-field-to-pci_device_id-structure.patch (6.09 kB)
0002-PCI-add-matching-checks-for-driver_override-binding.patch (6.08 kB)
Download all attachments
Subject: RE: [RFC PATCH v4 00/11] Introduce vfio-pci-core subsystem

Hi Max/ Yishai,

(Sorry I picked this thread instead of the [1] here as I don't have that
in my mailbox)

I see that an update to this series has been posted by Yishai [1] and it mentions
about a branch with all relevant patches,

" A preview of all the patches can be seen here:
https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci"

But sorry I couldn't find the patches in the branch above. Could you
please check and let me know.

Thanks,
Shameer

[1] https://lore.kernel.org/kvm/[email protected]/#R


> -----Original Message-----
> From: Max Gurtovoy [mailto:[email protected]]
> Sent: 03 June 2021 17:08
> To: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; Shameerali
> Kolothum Thodi <[email protected]>; liulongfang
> <[email protected]>; [email protected]; Max Gurtovoy
> <[email protected]>
> Subject: [RFC PATCH v4 00/11] Introduce vfio-pci-core subsystem
>
> Hi Alex, Cornelia, Jason and Co,
>
> This series split the vfio_pci driver into 2 parts: pci drivers and a
> subsystem driver that will also be library of code. The main pci driver,
> vfio_pci.ko will be used as before and it will bind to the subsystem
> driver vfio_pci_core.ko to register to the VFIO subsystem.
>
> This series is coming to solve some of the issues that were raised in
> the previous attempts for extending vfio-pci for vendor specific
> functionality:
> 1. https://lkml.org/lkml/2020/5/17/376 by Yan Zhao.
> 2. https://www.spinics.net/lists/kernel/msg3903996.html by Longfang Liu
>
> This subsystem framework will also ease on adding new vendor specific
> functionality to VFIO devices in the future by allowing another module
> to provide the pci_driver that can setup number of details before
> registering to VFIO subsystem (such as inject its own operations).
>
> This series also extends the "driver_override" mechanism. We added a flag
> for pci drivers that will declare themselves as "driver_override" capable
> and only declared drivers can use this mechanism in the PCI subsystem.
> Other drivers will not be able to bind to devices that use "driver_override".
> Also, the PCI driver matching will always look for ID table and will never
> generate dummy "match_all" ID table in the PCI subsystem layer. In this
> way, we ensure deterministic behaviour with no races with the original
> pci drivers. In order to get the best match for "driver_override" drivers,
> one can create a userspace program (example can be found at
> https://github.com/maxgurtovoy/linux_tools/blob/main/vfio/bind_vfio_pci_dr
> iver.py)
> that find the 'best match' according to simple algorithm: "the driver
> with the fewest '*' matches wins."
> For example, the vfio-pci driver will match to any pci device. So it
> will have the maximal '*' matches (for all matching IDs: vendor, device,
> subvendor, ...).
> In case we are looking for a match to mlx5 based device, we'll have a
> match to vfio-pci.ko and mlx5-vfio-pci.ko. We'll prefer mlx5-vfio-pci.ko
> since it will have less '*' matches (probably vendor and device IDs will
> match). This will work in the future for NVMe/Virtio devices that can
> match according to a class code or other criteria.
>
> The main goal of this series is to agree on the vfio_pci module split and the
> "driver_override" extensions. The follow-up version will include an extended
> mlx5_vfio_pci driver that will support VF suspend/resume as well.
>
> This series applied cleanly on top of vfio reflck re-design (still haven't sent
> for review) and can be found at:
> https://github.com/Mellanox/NVMEoF-P2P/tree/vfio-v4-external.
>
> Max Gurtovoy (11):
> vfio-pci: rename vfio_pci.c to vfio_pci_core.c
> vfio-pci: rename vfio_pci_private.h to vfio_pci_core.h
> vfio-pci: rename vfio_pci_device to vfio_pci_core_device
> vfio-pci: rename ops functions to fit core namings
> vfio-pci: include vfio header in vfio_pci_core.h
> vfio-pci: introduce vfio_pci.c
> vfio-pci: move igd initialization to vfio_pci.c
> PCI: add flags field to pci_device_id structure
> PCI: add matching checks for driver_override binding
> vfio-pci: introduce vfio_pci_core subsystem driver
> mlx5-vfio-pci: add new vfio_pci driver for mlx5 devices
>
> Documentation/ABI/testing/sysfs-bus-pci | 6 +-
> Documentation/PCI/pci.rst | 1 +
> drivers/pci/pci-driver.c | 22 +-
> drivers/vfio/pci/Kconfig | 27 +-
> drivers/vfio/pci/Makefile | 12 +-
> drivers/vfio/pci/mlx5_vfio_pci.c | 130 +
> drivers/vfio/pci/vfio_pci.c | 2329 +----------------
> drivers/vfio/pci/vfio_pci_config.c | 70 +-
> drivers/vfio/pci/vfio_pci_core.c | 2239 ++++++++++++++++
> drivers/vfio/pci/vfio_pci_igd.c | 16 +-
> drivers/vfio/pci/vfio_pci_intrs.c | 42 +-
> drivers/vfio/pci/vfio_pci_rdwr.c | 18 +-
> drivers/vfio/pci/vfio_pci_zdev.c | 4 +-
> include/linux/mod_devicetable.h | 9 +
> include/linux/pci.h | 27 +
> .../linux/vfio_pci_core.h | 93 +-
> scripts/mod/devicetable-offsets.c | 1 +
> scripts/mod/file2alias.c | 8 +-
> 18 files changed, 2695 insertions(+), 2359 deletions(-)
> create mode 100644 drivers/vfio/pci/mlx5_vfio_pci.c
> create mode 100644 drivers/vfio/pci/vfio_pci_core.c
> rename drivers/vfio/pci/vfio_pci_private.h => include/linux/vfio_pci_core.h
> (56%)
>
> --
> 2.21.0


2021-07-30 11:57:13

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH v4 00/11] Introduce vfio-pci-core subsystem

On Fri, Jul 30, 2021 at 07:53:00AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Max/ Yishai,
>
> (Sorry I picked this thread instead of the [1] here as I don't have that
> in my mailbox)
>
> I see that an update to this series has been posted by Yishai [1] and it mentions
> about a branch with all relevant patches,
>
> " A preview of all the patches can be seen here:
> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci"
>
> But sorry I couldn't find the patches in the branch above. Could you
> please check and let me know.

I fixed it

Jason