- Thomas, thank you for the previous reviews. I've made the appropriate
changes based on your feedback. Please take a look again at patches 5 and
11 for IMS setup. I'd really appreciate an ack if they look good.
- Dan and Vinod, I'd really appreciate it if you can review patches 1-3 for
idxd driver bits and provide an ack if they look good.
- Alex and Kirti, I'd very much appreciate it if you can review the series
and consider inclusion for 5.13 kernel if everything looks good.
Thank you all!
v5:
- Split out non driver IMS code to its own series.
- Removed device devsec detection code.
- Reworked irq_entries for IMS so emulated vector is also included.
- Reworked vidxd_send_interrupt() to take irq_entry directly (data ready for
consumption) (Thomas)
- Removed pointer to msi_entry in irq_entries (Thomas)
- Removed irq_domain check on free entries (Thomas)
- Split out irqbypass management code (Thomas)
- Fix EXPORT_SYMBOL to EXPORT_SYMBOL_GPL (Thomas)
- Refactored code to use auxiliary bus (Jason)
v4:
dev-msi:
- Make interrupt remapping code more readable (Thomas)
- Add flush writes to unmask/write and reset ims slots (Thomas)
- Interrupt Message Storm-> Interrupt Message Store (Thomas)
- Merge in pasid programming code. (Thomas)
mdev:
- Fixed up domain assignment (Thomas)
- Define magic numbers (Thomas)
- Move siov detection code to PCI common (Thomas)
- Remove duplicated MSI entry info (Thomas)
- Convert code to use ims_slot (Thomas)
- Add explanation of pasid programming for IMS entry (Thomas)
- Add release int handle release support due to spec 1.1 update.
v3:
Dev-msi:
- No need to add support for 2 different dev-msi irq domains, a common
once can be used for both the cases(with IR enabled/disabled)
- Add arch specific function to specify additions to msi_prepare callback
instead of making the callback a weak function
- Call platform ops directly instead of a wrapper function
- Make mask/unmask callbacks as void functions
dev->msi_domain should be updated at the device driver level before
calling dev_msi_alloc_irqs()
dev_msi_alloc/free_irqs() cannot be used for PCI devices
Followed the generic layering scheme: infrastructure bits->arch
bits->enabling bits
Mdev:
- Remove set kvm group notifier (Yan Zhao)
- Fix VFIO irq trigger removal (Yan Zhao)
- Add mmio read flush to ims mask (Jason)
v2:
IMS (now dev-msi):
- With recommendations from Jason/Thomas/Dan on making IMS more generic:
- Pass a non-pci generic device(struct device) for IMS management instead of
mdev
- Remove all references to mdev and symbol_get/put
- Remove all references to IMS in common code and replace with dev-msi
- Remove dynamic allocation of platform-msi interrupts: no groups,no
new msi list or list helpers
- Create a generic dev-msi domain with and without interrupt remapping
enabled.
- Introduce dev_msi_domain_alloc_irqs and dev_msi_domain_free_irqs apis
mdev:
- Removing unrelated bits from SVA enabling that’s not necessary for
the submission. (Kevin)
- Restructured entire mdev driver series to make reviewing easier (Kevin)
- Made rw emulation more robust (Kevin)
- Removed uuid wq type and added single dedicated wq type (Kevin)
- Locking fixes for vdev (Yan Zhao)
- VFIO MSIX trigger fixes (Yan Zhao)
This code series will match the support of the 5.6 kernel (stage 1) driver
but on guest.
The code has dependency on IMS enabling code:
https://lore.kernel.org/linux-pci/[email protected]/T/#t
Stage 1 of the driver has been accepted in v5.6 kernel. It supports dedicated
workqueue (wq) without Shared Virtual Memory (SVM) support.
Stage 2 of the driver supports shared wq and SVM and has been accepted in
kernel v5.11.
VFIO mediated device framework allows vendor drivers to wrap a portion of
device resources into virtual devices (mdev). Each mdev can be assigned
to different guest using the same set of VFIO uAPIs as assigning a
physical device. Accessing to the mdev resource is served with mixed
policies. For example, vendor drivers typically mark data-path interface
as pass-through for fast guest operations, and then trap-and-mediate the
control-path interface to avoid undesired interference between mdevs. Some
level of emulation is necessary behind vfio mdev to compose the virtual
device interface.
This series brings mdev to idxd driver to enable Intel Scalable IOV
(SIOV), a hardware-assisted mediated pass-through technology. SIOV makes
each DSA wq independently assignable through PASID-granular resource/DMA
isolation. It helps improve scalability and reduces mediation complexity
against purely software-based mdev implementations. Each assigned wq is
configured by host and exposed to the guest in a read-only configuration
mode, which allows the guest to use the wq w/o additional setup. This
design greatly reduces the emulation bits to focus on handling commands
from guests.
There are two possible avenues to support virtual device composition:
1. VFIO mediated device (mdev) or 2. User space DMA through char device
(or UACCE). Given the small portion of emulation to satisfy our needs
and VFIO mdev having the infrastructure already to support the device
passthrough, we feel that VFIO mdev is the better route. For more in depth
explanation, see documentation in Documents/driver-api/vfio/mdev-idxd.rst.
Introducing mdev types “1dwq-v1” type. This mdev type allows
allocation of a single dedicated wq from available dedicated wqs. After
a workqueue (wq) is enabled, the user will generate an uuid. On mdev
creation, the mdev driver code will find a dwq depending on the mdev
type. When the create operation is successful, the user generated uuid
can be passed to qemu. When the guest boots up, it should discover a
DSA device when doing PCI discovery.
For example of “1dwq-v1” type:
1. Enable wq with “mdev” wq type
2. A user generated uuid.
3. The uuid is written to the mdev class sysfs path:
echo $UUID > /sys/class/mdev_bus/0000\:00\:0a.0/mdev_supported_types/idxd-1dwq-v1/create
4. Pass the following parameter to qemu:
"-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:0a.0/$UUID"
The wq exported through mdev will have the read only config bit set
for configuration. This means that the device does not require the
typical configuration. After enabling the device, the user must set the
WQ type and name. That is all is necessary to enable the WQ and start
using it. The single wq configuration is not the only way to create the
mdev. Multi wqs support for mdev will be in the future works.
The mdev utilizes Interrupt Message Store or IMS[3], a device-specific
MSI implementation, instead of MSIX for interrupts for the guest. This
preserves MSIX for host usages and also allows a significantly larger
number of interrupt vectors for guest usage.
The idxd driver implements IMS as on-device memory mapped unified
storage. Each interrupt message is stored as a DWORD size data payload
and a 64-bit address (same as MSI-X). Access to the IMS is through the
host idxd driver.
The idxd driver makes use of the generic IMS irq chip and domain which
stores the interrupt messages as an array in device memory. Allocation and
freeing of interrupts happens via the generic msi_domain_alloc/free_irqs()
interface. One only needs to ensure the interrupt domain is stored in
the underlying device struct.
[1]: https://lore.kernel.org/lkml/157965011794.73301.15960052071729101309.stgit@djiang5-desk3.ch.intel.com/
[2]: https://software.intel.com/en-us/articles/intel-sdm
[3]: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[4]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
[5]: https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
[6]: https://intel.github.io/idxd/
[7]: https://github.com/intel/idxd-driver idxd-stage2.5
---
Dave Jiang (14):
vfio/mdev: idxd: add theory of operation documentation for idxd mdev
dmaengine: idxd: add IMS detection in base driver
dmaengine: idxd: add device support functions in prep for mdev
vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support
vfio/mdev: idxd: add basic mdev registration and helper functions
vfio/mdev: idxd: add mdev type as a new wq type
vfio/mdev: idxd: add 1dwq-v1 mdev type
vfio/mdev: idxd: add emulation rw routines
vfio/mdev: idxd: prep for virtual device commands
vfio/mdev: idxd: virtual device commands emulation
vfio/mdev: idxd: ims setup for the vdcm
vfio/mdev: idxd: add irq bypass for IMS vectors
vfio/mdev: idxd: add new wq state for mdev
vfio/mdev: idxd: add error notification from host driver to mediated device
.../ABI/stable/sysfs-driver-dma-idxd | 6 +
MAINTAINERS | 8 +-
drivers/dma/idxd/Makefile | 2 +
drivers/dma/idxd/cdev.c | 6 +-
drivers/dma/idxd/device.c | 137 +-
drivers/dma/idxd/idxd.h | 48 +-
drivers/dma/idxd/init.c | 98 +-
drivers/dma/idxd/irq.c | 8 +-
drivers/dma/idxd/registers.h | 36 +-
drivers/dma/idxd/sysfs.c | 33 +-
drivers/vfio/mdev/Kconfig | 10 +
drivers/vfio/mdev/Makefile | 1 +
drivers/vfio/mdev/idxd/Makefile | 4 +
drivers/vfio/mdev/idxd/mdev.c | 1295 +++++++++++++++++
drivers/vfio/mdev/idxd/mdev.h | 119 ++
drivers/vfio/mdev/idxd/vdev.c | 1014 +++++++++++++
drivers/vfio/mdev/idxd/vdev.h | 28 +
include/uapi/linux/idxd.h | 2 +
kernel/irq/msi.c | 2 +
19 files changed, 2814 insertions(+), 43 deletions(-)
create mode 100644 drivers/vfio/mdev/idxd/Makefile
create mode 100644 drivers/vfio/mdev/idxd/mdev.c
create mode 100644 drivers/vfio/mdev/idxd/mdev.h
create mode 100644 drivers/vfio/mdev/idxd/vdev.c
create mode 100644 drivers/vfio/mdev/idxd/vdev.h
--
Add device support helper functions in preparation of adding VFIO
mdev support.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/device.c | 61 ++++++++++++++++++++++++++++++++++++++++++
drivers/dma/idxd/idxd.h | 4 +++
drivers/dma/idxd/registers.h | 3 +-
3 files changed, 67 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index d6c447d09a6f..2491b27c8125 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -306,6 +306,30 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq)
devm_iounmap(dev, wq->portal);
}
+int idxd_wq_abort(struct idxd_wq *wq)
+{
+ struct idxd_device *idxd = wq->idxd;
+ struct device *dev = &idxd->pdev->dev;
+ u32 operand, status;
+
+ dev_dbg(dev, "Abort WQ %d\n", wq->id);
+ if (wq->state != IDXD_WQ_ENABLED) {
+ dev_dbg(dev, "WQ %d not active\n", wq->id);
+ return -ENXIO;
+ }
+
+ operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
+ dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand);
+ idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status);
+ if (status != IDXD_CMDSTS_SUCCESS) {
+ dev_dbg(dev, "WQ abort failed: %#x\n", status);
+ return -ENXIO;
+ }
+
+ dev_dbg(dev, "WQ %d aborted\n", wq->id);
+ return 0;
+}
+
int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
{
struct idxd_device *idxd = wq->idxd;
@@ -412,6 +436,32 @@ void idxd_wq_quiesce(struct idxd_wq *wq)
percpu_ref_exit(&wq->wq_active);
}
+void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid)
+{
+ struct idxd_device *idxd = wq->idxd;
+ int offset;
+
+ lockdep_assert_held(&idxd->dev_lock);
+
+ /* PASID fields are 8 bytes into the WQCFG register */
+ offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PASID_IDX);
+ wq->wqcfg->pasid = pasid;
+ iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
+}
+
+void idxd_wq_setup_priv(struct idxd_wq *wq, int priv)
+{
+ struct idxd_device *idxd = wq->idxd;
+ int offset;
+
+ lockdep_assert_held(&idxd->dev_lock);
+
+ /* priv field is 8 bytes into the WQCFG register */
+ offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIV_IDX);
+ wq->wqcfg->priv = !!priv;
+ iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset);
+}
+
/* Device control bits */
static inline bool idxd_is_enabled(struct idxd_device *idxd)
{
@@ -599,6 +649,17 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid)
dev_dbg(dev, "pasid %d drained\n", pasid);
}
+void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid)
+{
+ struct device *dev = &idxd->pdev->dev;
+ u32 operand;
+
+ operand = pasid;
+ dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_PASID, operand);
+ idxd_cmd_exec(idxd, IDXD_CMD_ABORT_PASID, operand, NULL);
+ dev_dbg(dev, "pasid %d aborted\n", pasid);
+}
+
int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle,
enum idxd_interrupt_type irq_type)
{
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 90c9458903e1..a2438b3166db 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -350,6 +350,7 @@ void idxd_device_cleanup(struct idxd_device *idxd);
int idxd_device_config(struct idxd_device *idxd);
void idxd_device_wqs_clear_state(struct idxd_device *idxd);
void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid);
+void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid);
int idxd_device_load_config(struct idxd_device *idxd);
int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle,
enum idxd_interrupt_type irq_type);
@@ -369,6 +370,9 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid);
int idxd_wq_disable_pasid(struct idxd_wq *wq);
void idxd_wq_quiesce(struct idxd_wq *wq);
int idxd_wq_init_percpu_ref(struct idxd_wq *wq);
+int idxd_wq_abort(struct idxd_wq *wq);
+void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid);
+void idxd_wq_setup_priv(struct idxd_wq *wq, int priv);
/* submission */
int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc);
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index c97f700bcf34..d9a732decdd5 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -347,7 +347,8 @@ union wqcfg {
u32 bits[8];
} __packed;
-#define WQCFG_PASID_IDX 2
+#define WQCFG_PASID_IDX 2
+#define WQCFG_PRIV_IDX 2
/*
* This macro calculates the offset into the WQCFG register
Add the VFIO mediated device driver as an auxiliary device to the main idxd
driver. This allows the mdev code to be under VFIO mdev subsystem.
Signed-off-by: Dave Jiang <[email protected]>
---
MAINTAINERS | 8 ++++
drivers/dma/idxd/Makefile | 2 +
drivers/dma/idxd/idxd.h | 7 ++++
drivers/dma/idxd/init.c | 77 +++++++++++++++++++++++++++++++++++++++
drivers/vfio/mdev/Kconfig | 9 +++++
drivers/vfio/mdev/Makefile | 1 +
drivers/vfio/mdev/idxd/Makefile | 4 ++
drivers/vfio/mdev/idxd/mdev.c | 75 ++++++++++++++++++++++++++++++++++++++
8 files changed, 182 insertions(+), 1 deletion(-)
create mode 100644 drivers/vfio/mdev/idxd/Makefile
create mode 100644 drivers/vfio/mdev/idxd/mdev.c
diff --git a/MAINTAINERS b/MAINTAINERS
index ae34b0331eb4..71862e759075 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8970,7 +8970,6 @@ INTEL IADX DRIVER
M: Dave Jiang <[email protected]>
L: [email protected]
S: Supported
-F: Documentation/driver-api/vfio/mdev-idxd.rst
F: drivers/dma/idxd/*
F: include/uapi/linux/idxd.h
@@ -18720,6 +18719,13 @@ F: drivers/vfio/mdev/
F: include/linux/mdev.h
F: samples/vfio-mdev/
+VFIO MEDIATED DEVICE IDXD DRIVER
+M: Dave Jiang <[email protected]>
+L: [email protected]
+S: Maintained
+F: Documentation/driver-api/vfio/mdev-idxd.rst
+F: drivers/vfio/mdev/idxd/
+
VFIO PLATFORM DRIVER
M: Eric Auger <[email protected]>
L: [email protected]
diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile
index 8978b898d777..d91d1718efac 100644
--- a/drivers/dma/idxd/Makefile
+++ b/drivers/dma/idxd/Makefile
@@ -1,2 +1,4 @@
+ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=IDXD
+
obj-$(CONFIG_INTEL_IDXD) += idxd.o
idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a2438b3166db..f02c96164515 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -8,6 +8,7 @@
#include <linux/percpu-rwsem.h>
#include <linux/wait.h>
#include <linux/cdev.h>
+#include <linux/auxiliary_bus.h>
#include "registers.h"
#define IDXD_DRIVER_VERSION "1.00"
@@ -221,6 +222,8 @@ struct idxd_device {
struct work_struct work;
int *int_handles;
+
+ struct auxiliary_device *mdev_auxdev;
};
/* IDXD software descriptor */
@@ -282,6 +285,10 @@ enum idxd_interrupt_type {
IDXD_IRQ_IMS,
};
+struct idxd_mdev_aux_drv {
+ struct auxiliary_driver auxiliary_drv;
+};
+
static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
enum idxd_interrupt_type irq_type)
{
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index ee56b92108d8..fd57f39e4b7d 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -382,6 +382,74 @@ static void idxd_disable_system_pasid(struct idxd_device *idxd)
idxd->sva = NULL;
}
+static void idxd_remove_mdev_auxdev(struct idxd_device *idxd)
+{
+ if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
+ return;
+
+ auxiliary_device_delete(idxd->mdev_auxdev);
+ auxiliary_device_uninit(idxd->mdev_auxdev);
+}
+
+static void idxd_auxdev_release(struct device *dev)
+{
+ struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
+ struct idxd_device *idxd = dev_get_drvdata(dev);
+
+ kfree(auxdev->name);
+ kfree(auxdev);
+ idxd->mdev_auxdev = NULL;
+}
+
+static int idxd_setup_mdev_auxdev(struct idxd_device *idxd)
+{
+ struct auxiliary_device *auxdev;
+ struct device *dev = &idxd->pdev->dev;
+ int rc;
+
+ if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
+ return 0;
+
+ auxdev = kzalloc(sizeof(*auxdev), GFP_KERNEL);
+ if (!auxdev)
+ return -ENOMEM;
+
+ auxdev->name = kasprintf(GFP_KERNEL, "mdev-%s", idxd_name[idxd->type]);
+ if (!auxdev->name) {
+ rc = -ENOMEM;
+ goto err_name;
+ }
+
+ dev_dbg(&idxd->pdev->dev, "aux dev mdev: %s\n", auxdev->name);
+
+ auxdev->dev.parent = dev;
+ auxdev->dev.release = idxd_auxdev_release;
+ auxdev->id = idxd->id;
+
+ rc = auxiliary_device_init(auxdev);
+ if (rc < 0) {
+ dev_err(dev, "Failed to init aux dev: %d\n", rc);
+ goto err_auxdev;
+ }
+
+ rc = auxiliary_device_add(auxdev);
+ if (rc < 0) {
+ dev_err(dev, "Failed to add aux dev: %d\n", rc);
+ goto err_auxdev;
+ }
+
+ idxd->mdev_auxdev = auxdev;
+ dev_set_drvdata(&auxdev->dev, idxd);
+
+ return 0;
+
+ err_auxdev:
+ kfree(auxdev->name);
+ err_name:
+ kfree(auxdev);
+ return rc;
+}
+
static int idxd_probe(struct idxd_device *idxd)
{
struct pci_dev *pdev = idxd->pdev;
@@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
goto err_idr_fail;
}
+ rc = idxd_setup_mdev_auxdev(idxd);
+ if (rc < 0)
+ goto err_auxdev_fail;
+
idxd->major = idxd_cdev_get_major(idxd);
dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
return 0;
+ err_auxdev_fail:
+ mutex_lock(&idxd_idr_lock);
+ idr_remove(&idxd_idrs[idxd->type], idxd->id);
+ mutex_unlock(&idxd_idr_lock);
err_idr_fail:
idxd_mask_error_interrupts(idxd);
idxd_mask_msix_vectors(idxd);
@@ -610,6 +686,7 @@ static void idxd_remove(struct pci_dev *pdev)
dev_dbg(&pdev->dev, "%s called\n", __func__);
idxd_cleanup_sysfs(idxd);
idxd_shutdown(pdev);
+ idxd_remove_mdev_auxdev(idxd);
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
mutex_lock(&idxd_idr_lock);
diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 5da27f2100f9..e9540e43d1f1 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -16,3 +16,12 @@ config VFIO_MDEV_DEVICE
default n
help
VFIO based driver for Mediated devices.
+
+config VFIO_MDEV_IDXD
+ tristate "VFIO Mediated device driver for Intel IDXD"
+ depends on VFIO && VFIO_MDEV && X86_64
+ select AUXILIARY_BUS
+ select IMS_MSI_ARRAY
+ default n
+ help
+ VFIO based mediated device driver for Intel Accelerator Devices driver.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 101516fdf375..338843fa6110 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -4,3 +4,4 @@ mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
obj-$(CONFIG_VFIO_MDEV) += mdev.o
obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
+obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd/
diff --git a/drivers/vfio/mdev/idxd/Makefile b/drivers/vfio/mdev/idxd/Makefile
new file mode 100644
index 000000000000..e8f45cb96117
--- /dev/null
+++ b/drivers/vfio/mdev/idxd/Makefile
@@ -0,0 +1,4 @@
+ccflags-y += -I$(srctree)/drivers/dma/idxd -DDEFAULT_SYMBOL_NAMESPACE=IDXD
+
+obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd_mdev.o
+idxd_mdev-y := mdev.o
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
new file mode 100644
index 000000000000..8b9a6adeb606
--- /dev/null
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/device.h>
+#include <linux/auxiliary_bus.h>
+#include <uapi/linux/idxd.h>
+#include "registers.h"
+#include "idxd.h"
+
+static int idxd_mdev_host_init(struct idxd_device *idxd)
+{
+ /* FIXME: Fill in later */
+ return 0;
+}
+
+static int idxd_mdev_host_release(struct idxd_device *idxd)
+{
+ /* FIXME: Fill in later */
+ return 0;
+}
+
+static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
+ const struct auxiliary_device_id *id)
+{
+ struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
+ int rc;
+
+ rc = idxd_mdev_host_init(idxd);
+ if (rc < 0) {
+ dev_warn(&auxdev->dev, "mdev host init failed: %d\n", rc);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void idxd_mdev_aux_remove(struct auxiliary_device *auxdev)
+{
+ struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
+
+ idxd_mdev_host_release(idxd);
+}
+
+static const struct auxiliary_device_id idxd_mdev_auxbus_id_table[] = {
+ { .name = "idxd.mdev-dsa" },
+ { .name = "idxd.mdev-iax" },
+ {},
+};
+MODULE_DEVICE_TABLE(auxiliary, idxd_mdev_auxbus_id_table);
+
+static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
+ .auxiliary_drv = {
+ .id_table = idxd_mdev_auxbus_id_table,
+ .probe = idxd_mdev_aux_probe,
+ .remove = idxd_mdev_aux_remove,
+ },
+};
+
+static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
+{
+ return auxiliary_driver_register(&drv->auxiliary_drv);
+}
+
+static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv)
+{
+ auxiliary_driver_unregister(&drv->auxiliary_drv);
+}
+
+module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
When a dedicated wq is enabled as mdev, we must disable the wq on the
device in order to program the pasid to the wq. Introduce a wq state
IDXD_WQ_LOCKED that is software state only in order to prevent the user
from modifying the configuration while mdev wq is in this state. While
in this state, the wq is not in DISABLED state and will prevent any
modifications to the configuration. It is also not in the ENABLED state
and therefore prevents any actions allowed in the ENABLED state.
For mdev, the dwq is disabled and set to LOCKED state upon the mdev
creation. When ->open() is called on the mdev and a pasid is programmed to
the WQCFG, the dwq is enabled again and goes to the ENABLED state.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/device.c | 9 +++++++++
drivers/dma/idxd/idxd.h | 1 +
drivers/dma/idxd/sysfs.c | 2 ++
drivers/vfio/mdev/idxd/mdev.c | 4 +++-
4 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index c5faa23bd8ce..1cd64a6a60de 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -252,6 +252,14 @@ int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
dev_dbg(dev, "Disabling WQ %d\n", wq->id);
+ /*
+ * When the wq is in LOCKED state, it means it is disabled but
+ * also at the same time is "enabled" as far as the user is
+ * concerned. So a call to disable the hardware can be skipped.
+ */
+ if (wq->state == IDXD_WQ_LOCKED)
+ goto out;
+
if (wq->state != IDXD_WQ_ENABLED) {
dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state);
return 0;
@@ -268,6 +276,7 @@ int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
return -ENXIO;
}
+ out:
wq->state = IDXD_WQ_DISABLED;
dev_dbg(dev, "WQ %d disabled\n", wq->id);
return 0;
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index c5ef6ccc9ba6..4afe35385f85 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -62,6 +62,7 @@ struct idxd_group {
enum idxd_wq_state {
IDXD_WQ_DISABLED = 0,
IDXD_WQ_ENABLED,
+ IDXD_WQ_LOCKED,
};
enum idxd_wq_flag {
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 913ff019fe36..1bce55ac24b9 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -879,6 +879,8 @@ static ssize_t wq_state_show(struct device *dev,
return sprintf(buf, "disabled\n");
case IDXD_WQ_ENABLED:
return sprintf(buf, "enabled\n");
+ case IDXD_WQ_LOCKED:
+ return sprintf(buf, "locked\n");
}
return sprintf(buf, "unknown\n");
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index d59920f78109..60913950a4f5 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -116,8 +116,10 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd)
vidxd_mmio_init(vidxd);
- if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED)
+ if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) {
idxd_wq_disable(wq, NULL);
+ wq->state = IDXD_WQ_LOCKED;
+ }
}
static void idxd_vdcm_release(struct mdev_device *mdev)
When a device error occurs, the mediated device need to be notified in
order to notify the guest of device error. Add support to notify the
specific mdev when an error is wq specific and broadcast errors to all mdev
when it's a generic device error.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/idxd.h | 7 +++++++
drivers/dma/idxd/irq.c | 6 ++++++
drivers/vfio/mdev/idxd/mdev.c | 5 +++++
drivers/vfio/mdev/idxd/vdev.c | 32 ++++++++++++++++++++++++++++++++
drivers/vfio/mdev/idxd/vdev.h | 1 +
5 files changed, 51 insertions(+)
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 4afe35385f85..6016df029ed4 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -295,10 +295,17 @@ enum idxd_interrupt_type {
IDXD_IRQ_IMS,
};
+struct aux_mdev_ops {
+ void (*notify_error)(struct idxd_wq *wq);
+};
+
struct idxd_mdev_aux_drv {
struct auxiliary_driver auxiliary_drv;
+ const struct aux_mdev_ops ops;
};
+#define to_mdev_aux_drv(_aux_drv) container_of(_aux_drv, struct idxd_mdev_aux_drv, auxiliary_drv)
+
static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
enum idxd_interrupt_type irq_type)
{
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index 090926856df3..9cdd3e789799 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -118,6 +118,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
u32 val = 0;
int i;
bool err = false;
+ struct auxiliary_driver *auxdrv = to_auxiliary_drv(idxd->mdev_auxdev->dev.driver);
+ struct idxd_mdev_aux_drv *mdevdrv = to_mdev_aux_drv(auxdrv);
if (cause & IDXD_INTC_ERR) {
spin_lock_bh(&idxd->dev_lock);
@@ -132,6 +134,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
if (wq->type == IDXD_WQT_USER)
wake_up_interruptible(&wq->idxd_cdev.err_queue);
+ else if (wq->type == IDXD_WQT_MDEV)
+ mdevdrv->ops.notify_error(wq);
} else {
int i;
@@ -140,6 +144,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
if (wq->type == IDXD_WQT_USER)
wake_up_interruptible(&wq->idxd_cdev.err_queue);
+ else if (wq->type == IDXD_WQT_MDEV)
+ mdevdrv->ops.notify_error(wq);
}
}
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 60913950a4f5..edccaad66c8c 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -1266,12 +1266,17 @@ static const struct auxiliary_device_id idxd_mdev_auxbus_id_table[] = {
};
MODULE_DEVICE_TABLE(auxiliary, idxd_mdev_auxbus_id_table);
+static const struct aux_mdev_ops aux_mdev_ops = {
+ .notify_error = idxd_wq_vidxd_send_errors,
+};
+
static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
.auxiliary_drv = {
.id_table = idxd_mdev_auxbus_id_table,
.probe = idxd_mdev_aux_probe,
.remove = idxd_mdev_aux_remove,
},
+ .ops = aux_mdev_ops,
};
static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 8626438a9e54..3aa9d5b870e8 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -980,3 +980,35 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
break;
}
}
+
+static void vidxd_send_errors(struct vdcm_idxd *vidxd)
+{
+ struct idxd_device *idxd = vidxd->idxd;
+ u8 *bar0 = vidxd->bar0;
+ union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET);
+ union genctrl_reg *genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET);
+ u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET);
+ int i;
+
+ if (swerr->valid) {
+ if (!swerr->overflow)
+ swerr->overflow = 1;
+ return;
+ }
+
+ lockdep_assert_held(&idxd->dev_lock);
+ for (i = 0; i < 4; i++)
+ swerr->bits[i] = idxd->sw_err.bits[i];
+
+ *intcause |= IDXD_INTC_ERR;
+ if (genctrl->softerr_int_en)
+ vidxd_send_interrupt(&vidxd->irq_entries[0]);
+}
+
+void idxd_wq_vidxd_send_errors(struct idxd_wq *wq)
+{
+ struct vdcm_idxd *vidxd;
+
+ list_for_each_entry(vidxd, &wq->vdcm_list, list)
+ vidxd_send_errors(vidxd);
+}
diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
index fc0f405baa40..00df08f9a963 100644
--- a/drivers/vfio/mdev/idxd/vdev.h
+++ b/drivers/vfio/mdev/idxd/vdev.h
@@ -23,5 +23,6 @@ int vidxd_send_interrupt(struct ims_irq_entry *iie);
int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);
void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val);
+void idxd_wq_vidxd_send_errors(struct idxd_wq *wq);
#endif
Add all the helper functions that supports the emulation of the commands
that are submitted to the device command register.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/device.c | 5
drivers/dma/idxd/registers.h | 16 +
drivers/vfio/mdev/idxd/mdev.c | 2
drivers/vfio/mdev/idxd/mdev.h | 3
drivers/vfio/mdev/idxd/vdev.c | 440 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 460 insertions(+), 6 deletions(-)
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 245d576ddc43..c5faa23bd8ce 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -242,6 +242,7 @@ int idxd_wq_enable(struct idxd_wq *wq, u32 *status)
dev_dbg(dev, "WQ %d enabled\n", wq->id);
return 0;
}
+EXPORT_SYMBOL_GPL(idxd_wq_enable);
int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
{
@@ -299,6 +300,7 @@ int idxd_wq_drain(struct idxd_wq *wq, u32 *status)
dev_dbg(dev, "WQ %d drained\n", wq->id);
return 0;
}
+EXPORT_SYMBOL_GPL(idxd_wq_drain);
int idxd_wq_map_portal(struct idxd_wq *wq)
{
@@ -351,6 +353,7 @@ int idxd_wq_abort(struct idxd_wq *wq, u32 *status)
dev_dbg(dev, "WQ %d aborted\n", wq->id);
return 0;
}
+EXPORT_SYMBOL_GPL(idxd_wq_abort);
int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
{
@@ -470,6 +473,7 @@ void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid)
wq->wqcfg->pasid = pasid;
iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
}
+EXPORT_SYMBOL_GPL(idxd_wq_setup_pasid);
void idxd_wq_setup_priv(struct idxd_wq *wq, int priv)
{
@@ -483,6 +487,7 @@ void idxd_wq_setup_priv(struct idxd_wq *wq, int priv)
wq->wqcfg->priv = !!priv;
iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset);
}
+EXPORT_SYMBOL_GPL(idxd_wq_setup_priv);
/* Device control bits */
static inline bool idxd_is_enabled(struct idxd_device *idxd)
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index 50ea94259c99..0f985787417c 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -120,7 +120,8 @@ union gencfg_reg {
union genctrl_reg {
struct {
u32 softerr_int_en:1;
- u32 rsvd:31;
+ u32 halt_state_int_en:1;
+ u32 rsvd:30;
};
u32 bits;
} __packed;
@@ -142,6 +143,8 @@ enum idxd_device_status_state {
IDXD_DEVICE_STATE_HALT,
};
+#define IDXD_GENSTATS_MASK 0x03
+
enum idxd_device_reset_type {
IDXD_DEVICE_RESET_SOFTWARE = 0,
IDXD_DEVICE_RESET_FLR,
@@ -154,6 +157,7 @@ enum idxd_device_reset_type {
#define IDXD_INTC_CMD 0x02
#define IDXD_INTC_OCCUPY 0x04
#define IDXD_INTC_PERFMON_OVFL 0x08
+#define IDXD_INTC_HALT_STATE 0x10
#define IDXD_CMD_OFFSET 0xa0
union idxd_command_reg {
@@ -165,6 +169,7 @@ union idxd_command_reg {
};
u32 bits;
} __packed;
+#define IDXD_CMD_INT_MASK 0x80000000
enum idxd_cmd {
IDXD_CMD_ENABLE_DEVICE = 1,
@@ -228,10 +233,11 @@ enum idxd_cmdsts_err {
/* disable device errors */
IDXD_CMDSTS_ERR_DIS_DEV_EN = 0x31,
/* disable WQ, drain WQ, abort WQ, reset WQ */
- IDXD_CMDSTS_ERR_DEV_NOT_EN,
+ IDXD_CMDSTS_ERR_WQ_NOT_EN,
/* request interrupt handle */
IDXD_CMDSTS_ERR_INVAL_INT_IDX = 0x41,
IDXD_CMDSTS_ERR_NO_HANDLE,
+ IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE,
};
#define IDXD_CMDCAP_OFFSET 0xb0
@@ -353,6 +359,12 @@ union wqcfg {
u32 bits[8];
} __packed;
+enum idxd_wq_hw_state {
+ IDXD_WQ_DEV_DISABLED = 0,
+ IDXD_WQ_DEV_ENABLED,
+ IDXD_WQ_DEV_BUSY,
+};
+
#define WQCFG_PASID_IDX 2
#define WQCFG_PRIV_IDX 2
#define WQCFG_MODE_DEDICATED 1
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 67e6b33468cd..7cde707021db 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -52,7 +52,7 @@ static char idxd_iax_1dwq_name[IDXD_MDEV_NAME_LEN];
static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index,
unsigned int start, unsigned int count, void *data);
-static int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid)
+int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid)
{
struct vfio_group *vfio_group;
struct iommu_domain *iommu_domain;
diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h
index 7ca50f054714..8421b4962ac7 100644
--- a/drivers/vfio/mdev/idxd/mdev.h
+++ b/drivers/vfio/mdev/idxd/mdev.h
@@ -38,6 +38,7 @@ struct ims_irq_entry {
bool irq_set;
int id;
int irq;
+ int ims_idx;
};
struct idxd_vdev {
@@ -112,4 +113,6 @@ static inline u64 get_reg_val(void *buf, int size)
return val;
}
+int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid);
+
#endif
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 958b09987e5c..766fd98e9eea 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -492,17 +492,451 @@ void vidxd_mmio_init(struct vdcm_idxd *vidxd)
static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val)
{
- /* PLACEHOLDER */
+ u8 *bar0 = vidxd->bar0;
+ u32 *cmd = (u32 *)(bar0 + IDXD_CMD_OFFSET);
+ u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET);
+ u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET);
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+
+ *cmdsts = val;
+ dev_dbg(dev, "%s: cmd: %#x status: %#x\n", __func__, *cmd, val);
+
+ if (*cmd & IDXD_CMD_INT_MASK) {
+ *intcause |= IDXD_INTC_CMD;
+ vidxd_send_interrupt(&vidxd->irq_entries[0]);
+ }
+}
+
+static void vidxd_enable(struct vdcm_idxd *vidxd)
+{
+ u8 *bar0 = vidxd->bar0;
+ union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET);
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+
+ dev_dbg(dev, "%s\n", __func__);
+ if (gensts->state == IDXD_DEVICE_STATE_ENABLED)
+ return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_ENABLED);
+
+ /* Check PCI configuration */
+ if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER))
+ return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_BUSMASTER_EN);
+
+ gensts->state = IDXD_DEVICE_STATE_ENABLED;
+
+ return idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_disable(struct vdcm_idxd *vidxd)
+{
+ struct idxd_wq *wq;
+ union wqcfg *wqcfg;
+ u8 *bar0 = vidxd->bar0;
+ union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET);
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ u32 status;
+
+ dev_dbg(dev, "%s\n", __func__);
+ if (gensts->state == IDXD_DEVICE_STATE_DISABLED) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN);
+ return;
+ }
+
+ wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+ wq = vidxd->wq;
+
+ /* If it is a DWQ, need to disable the DWQ as well */
+ if (wq_dedicated(wq)) {
+ idxd_wq_disable(wq, &status);
+ if (status) {
+ dev_warn(dev, "vidxd disable (wq disable) failed: %#x\n", status);
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN);
+ return;
+ }
+ } else {
+ idxd_wq_drain(wq, &status);
+ if (status)
+ dev_warn(dev, "vidxd disable (wq drain) failed: %#x\n", status);
+ }
+
+ wqcfg->wq_state = 0;
+ gensts->state = IDXD_DEVICE_STATE_DISABLED;
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_drain_all(struct vdcm_idxd *vidxd)
+{
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ struct idxd_wq *wq = vidxd->wq;
+
+ dev_dbg(dev, "%s\n", __func__);
+
+ idxd_wq_drain(wq, NULL);
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val)
+{
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ u8 *bar0 = vidxd->bar0;
+ union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+ struct idxd_wq *wq = vidxd->wq;
+ u32 status;
+
+ dev_dbg(dev, "%s\n", __func__);
+ if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+ return;
+ }
+
+ idxd_wq_drain(wq, &status);
+ if (status) {
+ dev_dbg(dev, "wq drain failed: %#x\n", status);
+ idxd_complete_command(vidxd, status);
+ return;
+ }
+
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_abort_all(struct vdcm_idxd *vidxd)
+{
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ struct idxd_wq *wq = vidxd->wq;
+
+ dev_dbg(dev, "%s\n", __func__);
+ idxd_wq_abort(wq, NULL);
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val)
+{
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ u8 *bar0 = vidxd->bar0;
+ union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+ struct idxd_wq *wq = vidxd->wq;
+ u32 status;
+
+ dev_dbg(dev, "%s\n", __func__);
+ if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+ return;
+ }
+
+ idxd_wq_abort(wq, &status);
+ if (status) {
+ dev_dbg(dev, "wq abort failed: %#x\n", status);
+ idxd_complete_command(vidxd, status);
+ return;
+ }
+
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
}
void vidxd_reset(struct vdcm_idxd *vidxd)
{
- /* PLACEHOLDER */
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ u8 *bar0 = vidxd->bar0;
+ union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET);
+ struct idxd_wq *wq;
+
+ dev_dbg(dev, "%s\n", __func__);
+ gensts->state = IDXD_DEVICE_STATE_DRAIN;
+ wq = vidxd->wq;
+
+ if (wq->state == IDXD_WQ_ENABLED) {
+ idxd_wq_abort(wq, NULL);
+ idxd_wq_disable(wq, NULL);
+ }
+
+ vidxd_mmio_init(vidxd);
+ gensts->state = IDXD_DEVICE_STATE_DISABLED;
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_reset(struct vdcm_idxd *vidxd, int wq_id_mask)
+{
+ struct idxd_wq *wq;
+ u8 *bar0 = vidxd->bar0;
+ union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ u32 status;
+
+ wq = vidxd->wq;
+ dev_dbg(dev, "vidxd reset wq %u:%u\n", 0, wq->id);
+
+ if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+ return;
+ }
+
+ idxd_wq_abort(wq, &status);
+ if (status) {
+ dev_dbg(dev, "vidxd reset wq failed to abort: %#x\n", status);
+ idxd_complete_command(vidxd, status);
+ return;
+ }
+
+ idxd_wq_disable(wq, &status);
+ if (status) {
+ dev_dbg(dev, "vidxd reset wq failed to disable: %#x\n", status);
+ idxd_complete_command(vidxd, status);
+ return;
+ }
+
+ wqcfg->wq_state = IDXD_WQ_DEV_DISABLED;
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int operand)
+{
+ bool ims = !!(operand & CMD_INT_HANDLE_IMS);
+ u32 cmdsts;
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ int ims_idx, vidx;
+
+ vidx = operand & GENMASK(15, 0);
+
+ dev_dbg(dev, "allocating int handle for %d\n", vidx);
+
+ /* vidx cannot be 0 since that's emulated and does not require IMS handle */
+ if (vidx <= 0 || vidx >= VIDXD_MAX_MSIX_ENTRIES) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX);
+ return;
+ }
+
+ if (ims) {
+ dev_warn(dev, "IMS allocation is not implemented yet\n");
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE);
+ return;
+ }
+
+ ims_idx = vidxd->irq_entries[vidx].ims_idx;
+ cmdsts = ims_idx << IDXD_CMDSTS_RES_SHIFT;
+ dev_dbg(dev, "requested index %d handle %d\n", vidx, ims_idx);
+ idxd_complete_command(vidxd, cmdsts);
+}
+
+static void vidxd_release_int_handle(struct vdcm_idxd *vidxd, int operand)
+{
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ bool ims = !!(operand & CMD_INT_HANDLE_IMS);
+ int handle, i;
+ bool found = false;
+
+ handle = operand & GENMASK(15, 0);
+ dev_dbg(dev, "allocating int handle %d\n", handle);
+
+ if (ims) {
+ dev_warn(dev, "IMS allocation is not implemented yet\n");
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE);
+ return;
+ }
+
+ /* IMS backed entry start at 1, 0 is emulated vector */
+ for (i = 1; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
+ if (vidxd->irq_entries[i].ims_idx == handle) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ dev_warn(dev, "Freeing unallocated int handle.\n");
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE);
+ }
+
+ dev_dbg(dev, "int handle %d released.\n", handle);
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id)
+{
+ struct idxd_wq *wq;
+ u8 *bar0 = vidxd->bar0;
+ union wq_cap_reg *wqcap;
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ struct idxd_device *idxd;
+ union wqcfg *vwqcfg, *wqcfg;
+ unsigned long flags;
+ u32 status, wq_pasid;
+ int priv, rc;
+
+ if (wq_id >= VIDXD_MAX_WQS) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX);
+ return;
+ }
+
+ idxd = vidxd->idxd;
+ wq = vidxd->wq;
+
+ dev_dbg(dev, "%s: wq %u:%u\n", __func__, wq_id, wq->id);
+
+ vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET + wq_id * 32);
+ wqcap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET);
+ wqcfg = wq->wqcfg;
+
+ if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN);
+ return;
+ }
+
+ if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED);
+ return;
+ }
+
+ if (wq_dedicated(wq) && wqcap->dedicated_mode == 0) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE);
+ return;
+ }
+
+ priv = 1;
+ rc = idxd_mdev_get_pasid(mdev, &wq_pasid);
+ if (rc < 0) {
+ dev_err(dev, "idxd pasid setup failed wq %d: %d\n", wq->id, rc);
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN);
+ return;
+ }
+
+ /* Clear pasid_en, pasid, and priv values */
+ wqcfg->bits[WQCFG_PASID_IDX] &= ~GENMASK(29, 8);
+ wqcfg->priv = priv;
+ wqcfg->pasid_en = 1;
+ wqcfg->pasid = wq_pasid;
+ dev_dbg(dev, "program pasid %d in wq %d\n", wq_pasid, wq->id);
+ spin_lock_irqsave(&idxd->dev_lock, flags);
+ idxd_wq_setup_pasid(wq, wq_pasid);
+ idxd_wq_setup_priv(wq, priv);
+ spin_unlock_irqrestore(&idxd->dev_lock, flags);
+ idxd_wq_enable(wq, &status);
+ if (status) {
+ dev_err(dev, "vidxd enable wq %d failed\n", wq->id);
+ idxd_complete_command(vidxd, status);
+ return;
+ }
+
+ vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED;
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask)
+{
+ struct idxd_wq *wq;
+ union wqcfg *wqcfg;
+ u8 *bar0 = vidxd->bar0;
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ u32 status;
+
+ wq = vidxd->wq;
+
+ dev_dbg(dev, "vidxd disable wq %u:%u\n", 0, wq->id);
+
+ wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+ if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+ return;
+ }
+
+ /* If it is a DWQ, need to disable the DWQ as well */
+ if (wq_dedicated(wq)) {
+ idxd_wq_disable(wq, &status);
+ if (status) {
+ dev_warn(dev, "vidxd disable wq failed: %#x\n", status);
+ idxd_complete_command(vidxd, status);
+ return;
+ }
+ } else {
+ idxd_wq_drain(wq, &status);
+ if (status) {
+ dev_warn(dev, "vidxd disable drain wq failed: %#x\n", status);
+ idxd_complete_command(vidxd, status);
+ return;
+ }
+ }
+
+ wqcfg->wq_state = IDXD_WQ_DEV_DISABLED;
+ idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static bool command_supported(struct vdcm_idxd *vidxd, u32 cmd)
+{
+ struct idxd_device *idxd = vidxd->idxd;
+
+ if (cmd == IDXD_CMD_REQUEST_INT_HANDLE || cmd == IDXD_CMD_RELEASE_INT_HANDLE)
+ return true;
+
+ return !!(idxd->hw.opcap.bits[0] & BIT_ULL(cmd));
}
void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
{
- /* PLACEHOLDER */
+ union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET);
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+
+ reg->bits = val;
+
+ dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, reg->bits);
+
+ if (!command_supported(vidxd, reg->cmd)) {
+ idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD);
+ return;
+ }
+
+ switch (reg->cmd) {
+ case IDXD_CMD_ENABLE_DEVICE:
+ vidxd_enable(vidxd);
+ break;
+ case IDXD_CMD_DISABLE_DEVICE:
+ vidxd_disable(vidxd);
+ break;
+ case IDXD_CMD_DRAIN_ALL:
+ vidxd_drain_all(vidxd);
+ break;
+ case IDXD_CMD_ABORT_ALL:
+ vidxd_abort_all(vidxd);
+ break;
+ case IDXD_CMD_RESET_DEVICE:
+ vidxd_reset(vidxd);
+ break;
+ case IDXD_CMD_ENABLE_WQ:
+ vidxd_wq_enable(vidxd, reg->operand);
+ break;
+ case IDXD_CMD_DISABLE_WQ:
+ vidxd_wq_disable(vidxd, reg->operand);
+ break;
+ case IDXD_CMD_DRAIN_WQ:
+ vidxd_wq_drain(vidxd, reg->operand);
+ break;
+ case IDXD_CMD_ABORT_WQ:
+ vidxd_wq_abort(vidxd, reg->operand);
+ break;
+ case IDXD_CMD_RESET_WQ:
+ vidxd_wq_reset(vidxd, reg->operand);
+ break;
+ case IDXD_CMD_REQUEST_INT_HANDLE:
+ vidxd_alloc_int_handle(vidxd, reg->operand);
+ break;
+ case IDXD_CMD_RELEASE_INT_HANDLE:
+ vidxd_release_int_handle(vidxd, reg->operand);
+ break;
+ default:
+ idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD);
+ break;
+ }
}
int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
Add setup for IMS enabling for the mediated device.
On the actual hardware the MSIX vector 0 is misc interrupt and
handles events such as administrative command completion, error
reporting, performance monitor overflow, and etc. The MSIX vectors
1...N are used for descriptor completion interrupts. On the guest
kernel, the MSIX interrupts are backed by the mediated device through
emulation or IMS vectors. Vector 0 is handled through emulation by
the host vdcm. The vector 1 (and more may be supported later) is
backed by IMS.
IMS can be setup with interrupt handlers via request_irq() just like
MSIX interrupts once the relevant IRQ domain is set.
The msi_domain_alloc_irqs()/msi_domain_free_irqs() APIs can then be
used to allocate interrupts from the above set domain.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/idxd.h | 1 +
drivers/vfio/mdev/idxd/mdev.c | 12 +++++++++
drivers/vfio/mdev/idxd/vdev.c | 53 ++++++++++++++++++++++++++++++++---------
kernel/irq/msi.c | 2 ++
4 files changed, 57 insertions(+), 11 deletions(-)
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 41eee987c9b7..c5ef6ccc9ba6 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -224,6 +224,7 @@ struct idxd_device {
struct workqueue_struct *wq;
struct work_struct work;
+ struct irq_domain *ims_domain;
int *int_handles;
struct auxiliary_device *mdev_auxdev;
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 7cde707021db..8a4af882a47f 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -1167,6 +1167,7 @@ static int alloc_supported_types(struct idxd_device *idxd)
int idxd_mdev_host_init(struct idxd_device *idxd)
{
struct device *dev = &idxd->pdev->dev;
+ struct ims_array_info ims_info;
int rc;
if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags))
@@ -1188,6 +1189,15 @@ int idxd_mdev_host_init(struct idxd_device *idxd)
return -EOPNOTSUPP;
}
+ ims_info.max_slots = idxd->ims_size;
+ ims_info.slots = idxd->reg_base + idxd->ims_offset;
+ idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info);
+ if (!idxd->ims_domain) {
+ dev_warn(dev, "Fail to acquire IMS domain\n");
+ iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
+ return -ENODEV;
+ }
+
return mdev_register_device(dev, &idxd_vdcm_ops);
}
@@ -1196,6 +1206,8 @@ void idxd_mdev_host_release(struct idxd_device *idxd)
struct device *dev = &idxd->pdev->dev;
int rc;
+ irq_domain_remove(idxd->ims_domain);
+
mdev_unregister_device(dev);
if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 766fd98e9eea..8626438a9e54 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -16,6 +16,7 @@
#include <linux/intel-svm.h>
#include <linux/kvm_host.h>
#include <linux/eventfd.h>
+#include <linux/irqchip/irq-ims-msi.h>
#include <uapi/linux/idxd.h>
#include "registers.h"
#include "idxd.h"
@@ -871,6 +872,47 @@ static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask)
idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
}
+void vidxd_free_ims_entries(struct vdcm_idxd *vidxd)
+{
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+
+ msi_domain_free_irqs(dev_get_msi_domain(dev), dev);
+}
+
+int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
+{
+ struct irq_domain *irq_domain;
+ struct idxd_device *idxd = vidxd->idxd;
+ struct mdev_device *mdev = vidxd->vdev.mdev;
+ struct device *dev = mdev_dev(mdev);
+ struct msi_desc *entry;
+ struct ims_irq_entry *irq_entry;
+ int rc, i;
+
+ irq_domain = idxd->ims_domain;
+ dev_set_msi_domain(dev, irq_domain);
+
+ /* We are allocate MAX_MSIX - 1 is because vector 0 is emulated and not IMS backed. */
+ rc = msi_domain_alloc_irqs(irq_domain, dev, VIDXD_MAX_MSIX_VECS - 1);
+ if (rc < 0)
+ return rc;
+ /*
+ * The first MSIX vector on the guest is emulated and not backed by IMS. To make matters
+ * simple the ims entries include the emulated vector. Here the code starts at index
+ * 1 to setup all the IMS backed vectors.
+ */
+ i = 1;
+ for_each_msi_entry(entry, dev) {
+ irq_entry = &vidxd->irq_entries[i];
+ irq_entry->ims_idx = entry->device_msi.hwirq;
+ irq_entry->irq = entry->irq;
+ i++;
+ }
+
+ return 0;
+}
+
static bool command_supported(struct vdcm_idxd *vidxd, u32 cmd)
{
struct idxd_device *idxd = vidxd->idxd;
@@ -938,14 +980,3 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
break;
}
}
-
-int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
-{
- /* PLACEHOLDER */
- return 0;
-}
-
-void vidxd_free_ims_entries(struct vdcm_idxd *vidxd)
-{
- /* PLACEHOLDER */
-}
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index d70d92eac322..d95299b4ae79 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -536,6 +536,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
return ops->domain_alloc_irqs(domain, dev, nvec);
}
+EXPORT_SYMBOL_GPL(msi_domain_alloc_irqs);
void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
{
@@ -572,6 +573,7 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
return ops->domain_free_irqs(domain, dev);
}
+EXPORT_SYMBOL_GPL(msi_domain_free_irqs);
/**
* msi_get_domain_info - Get the MSI interrupt domain info for @domain
Update some of the device commands in order to support usage by the virtual
device commands emulated by the vdcm. Expose some of the commands' raw
status so the virtual commands can utilize them accordingly.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/cdev.c | 2 +
drivers/dma/idxd/device.c | 69 +++++++++++++++++++++++++++--------------
drivers/dma/idxd/idxd.h | 8 ++---
drivers/dma/idxd/irq.c | 2 +
drivers/dma/idxd/sysfs.c | 8 ++---
drivers/vfio/mdev/idxd/mdev.c | 2 +
6 files changed, 56 insertions(+), 35 deletions(-)
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index b1518106434f..f46328ba8493 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -160,7 +160,7 @@ static int idxd_cdev_release(struct inode *node, struct file *filep)
if (rc < 0)
dev_err(dev, "wq disable pasid failed.\n");
} else {
- idxd_wq_drain(wq);
+ idxd_wq_drain(wq, NULL);
}
}
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 89fa2bbe6ebf..245d576ddc43 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -216,22 +216,25 @@ void idxd_wq_free_resources(struct idxd_wq *wq)
sbitmap_queue_free(&wq->sbq);
}
-int idxd_wq_enable(struct idxd_wq *wq)
+int idxd_wq_enable(struct idxd_wq *wq, u32 *status)
{
struct idxd_device *idxd = wq->idxd;
struct device *dev = &idxd->pdev->dev;
- u32 status;
+ u32 stat;
if (wq->state == IDXD_WQ_ENABLED) {
dev_dbg(dev, "WQ %d already enabled\n", wq->id);
return -ENXIO;
}
- idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status);
+ idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &stat);
- if (status != IDXD_CMDSTS_SUCCESS &&
- status != IDXD_CMDSTS_ERR_WQ_ENABLED) {
- dev_dbg(dev, "WQ enable failed: %#x\n", status);
+ if (status)
+ *status = stat;
+
+ if (stat != IDXD_CMDSTS_SUCCESS &&
+ stat != IDXD_CMDSTS_ERR_WQ_ENABLED) {
+ dev_dbg(dev, "WQ enable failed: %#x\n", stat);
return -ENXIO;
}
@@ -240,11 +243,11 @@ int idxd_wq_enable(struct idxd_wq *wq)
return 0;
}
-int idxd_wq_disable(struct idxd_wq *wq)
+int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
{
struct idxd_device *idxd = wq->idxd;
struct device *dev = &idxd->pdev->dev;
- u32 status, operand;
+ u32 stat, operand;
dev_dbg(dev, "Disabling WQ %d\n", wq->id);
@@ -254,10 +257,13 @@ int idxd_wq_disable(struct idxd_wq *wq)
}
operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
- idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &status);
+ idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &stat);
+
+ if (status)
+ *status = stat;
- if (status != IDXD_CMDSTS_SUCCESS) {
- dev_dbg(dev, "WQ disable failed: %#x\n", status);
+ if (stat != IDXD_CMDSTS_SUCCESS) {
+ dev_dbg(dev, "WQ disable failed: %#x\n", stat);
return -ENXIO;
}
@@ -267,20 +273,31 @@ int idxd_wq_disable(struct idxd_wq *wq)
}
EXPORT_SYMBOL_GPL(idxd_wq_disable);
-void idxd_wq_drain(struct idxd_wq *wq)
+int idxd_wq_drain(struct idxd_wq *wq, u32 *status)
{
struct idxd_device *idxd = wq->idxd;
struct device *dev = &idxd->pdev->dev;
- u32 operand;
+ u32 operand, stat;
if (wq->state != IDXD_WQ_ENABLED) {
dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state);
- return;
+ return 0;
}
dev_dbg(dev, "Draining WQ %d\n", wq->id);
operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
- idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, NULL);
+ idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, &stat);
+
+ if (status)
+ *status = stat;
+
+ if (stat != IDXD_CMDSTS_SUCCESS) {
+ dev_dbg(dev, "WQ drain failed: %#x\n", stat);
+ return -ENXIO;
+ }
+
+ dev_dbg(dev, "WQ %d drained\n", wq->id);
+ return 0;
}
int idxd_wq_map_portal(struct idxd_wq *wq)
@@ -307,11 +324,11 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq)
devm_iounmap(dev, wq->portal);
}
-int idxd_wq_abort(struct idxd_wq *wq)
+int idxd_wq_abort(struct idxd_wq *wq, u32 *status)
{
struct idxd_device *idxd = wq->idxd;
struct device *dev = &idxd->pdev->dev;
- u32 operand, status;
+ u32 operand, stat;
dev_dbg(dev, "Abort WQ %d\n", wq->id);
if (wq->state != IDXD_WQ_ENABLED) {
@@ -321,9 +338,13 @@ int idxd_wq_abort(struct idxd_wq *wq)
operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand);
- idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status);
- if (status != IDXD_CMDSTS_SUCCESS) {
- dev_dbg(dev, "WQ abort failed: %#x\n", status);
+ idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &stat);
+
+ if (status)
+ *status = stat;
+
+ if (stat != IDXD_CMDSTS_SUCCESS) {
+ dev_dbg(dev, "WQ abort failed: %#x\n", stat);
return -ENXIO;
}
@@ -339,7 +360,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
unsigned int offset;
unsigned long flags;
- rc = idxd_wq_disable(wq);
+ rc = idxd_wq_disable(wq, NULL);
if (rc < 0)
return rc;
@@ -351,7 +372,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
spin_unlock_irqrestore(&idxd->dev_lock, flags);
- rc = idxd_wq_enable(wq);
+ rc = idxd_wq_enable(wq, NULL);
if (rc < 0)
return rc;
@@ -366,7 +387,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
unsigned int offset;
unsigned long flags;
- rc = idxd_wq_disable(wq);
+ rc = idxd_wq_disable(wq, NULL);
if (rc < 0)
return rc;
@@ -378,7 +399,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
spin_unlock_irqrestore(&idxd->dev_lock, flags);
- rc = idxd_wq_enable(wq);
+ rc = idxd_wq_enable(wq, NULL);
if (rc < 0)
return rc;
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 67428c8d476d..41eee987c9b7 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -376,9 +376,9 @@ int idxd_device_release_int_handle(struct idxd_device *idxd, int handle,
/* work queue control */
int idxd_wq_alloc_resources(struct idxd_wq *wq);
void idxd_wq_free_resources(struct idxd_wq *wq);
-int idxd_wq_enable(struct idxd_wq *wq);
-int idxd_wq_disable(struct idxd_wq *wq);
-void idxd_wq_drain(struct idxd_wq *wq);
+int idxd_wq_enable(struct idxd_wq *wq, u32 *status);
+int idxd_wq_disable(struct idxd_wq *wq, u32 *status);
+int idxd_wq_drain(struct idxd_wq *wq, u32 *status);
int idxd_wq_map_portal(struct idxd_wq *wq);
void idxd_wq_unmap_portal(struct idxd_wq *wq);
void idxd_wq_disable_cleanup(struct idxd_wq *wq);
@@ -386,7 +386,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid);
int idxd_wq_disable_pasid(struct idxd_wq *wq);
void idxd_wq_quiesce(struct idxd_wq *wq);
int idxd_wq_init_percpu_ref(struct idxd_wq *wq);
-int idxd_wq_abort(struct idxd_wq *wq);
+int idxd_wq_abort(struct idxd_wq *wq, u32 *status);
void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid);
void idxd_wq_setup_priv(struct idxd_wq *wq, int priv);
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index a60ca11a5784..090926856df3 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -48,7 +48,7 @@ static void idxd_device_reinit(struct work_struct *work)
struct idxd_wq *wq = &idxd->wqs[i];
if (wq->state == IDXD_WQ_ENABLED) {
- rc = idxd_wq_enable(wq);
+ rc = idxd_wq_enable(wq, NULL);
if (rc < 0) {
dev_warn(dev, "Unable to re-enable wq %s\n",
dev_name(&wq->conf_dev));
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index d985a0ac23d9..913ff019fe36 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -189,7 +189,7 @@ static int enable_wq(struct idxd_wq *wq)
return rc;
}
- rc = idxd_wq_enable(wq);
+ rc = idxd_wq_enable(wq, NULL);
if (rc < 0) {
mutex_unlock(&wq->wq_lock);
dev_warn(dev, "WQ %d enabling failed: %d\n", wq->id, rc);
@@ -199,7 +199,7 @@ static int enable_wq(struct idxd_wq *wq)
rc = idxd_wq_map_portal(wq);
if (rc < 0) {
dev_warn(dev, "wq portal mapping failed: %d\n", rc);
- rc = idxd_wq_disable(wq);
+ rc = idxd_wq_disable(wq, NULL);
if (rc < 0)
dev_warn(dev, "IDXD wq disable failed\n");
mutex_unlock(&wq->wq_lock);
@@ -321,8 +321,8 @@ static void disable_wq(struct idxd_wq *wq)
idxd_wq_unmap_portal(wq);
- idxd_wq_drain(wq);
- rc = idxd_wq_disable(wq);
+ idxd_wq_drain(wq, NULL);
+ rc = idxd_wq_disable(wq, NULL);
idxd_wq_free_resources(wq);
wq->client_count = 0;
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 7529396f3812..67e6b33468cd 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -117,7 +117,7 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd)
vidxd_mmio_init(vidxd);
if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED)
- idxd_wq_disable(wq);
+ idxd_wq_disable(wq, NULL);
}
static void idxd_vdcm_release(struct mdev_device *mdev)
Add support to bypass host for IMS interrupts configured for the guest.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/vfio/mdev/Kconfig | 1 +
drivers/vfio/mdev/idxd/mdev.c | 17 +++++++++++++++--
drivers/vfio/mdev/idxd/mdev.h | 1 +
3 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index e9540e43d1f1..ab0a6f0930bc 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -22,6 +22,7 @@ config VFIO_MDEV_IDXD
depends on VFIO && VFIO_MDEV && X86_64
select AUXILIARY_BUS
select IMS_MSI_ARRAY
+ select IRQ_BYPASS_MANAGER
default n
help
VFIO based mediated device driver for Intel Accelerator Devices driver.
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 8a4af882a47f..d59920f78109 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -616,9 +616,13 @@ static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index)
dev_dbg(dev, "disable MSIX trigger %d\n", index);
if (index) {
+ struct irq_bypass_producer *producer;
u32 auxval;
+ producer = &vidxd->vdev.producer[index];
+ irq_bypass_unregister_producer(producer);
irq_entry = &vidxd->irq_entries[index];
+
if (irq_entry->irq_set) {
free_irq(irq_entry->irq, irq_entry);
irq_entry->irq_set = false;
@@ -654,9 +658,10 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index)
}
if (index) {
- u32 pasid;
- u32 auxval;
+ struct irq_bypass_producer *producer;
+ u32 pasid, auxval;
+ producer = &vidxd->vdev.producer[index];
irq_entry = &vidxd->irq_entries[index];
rc = idxd_mdev_get_pasid(mdev, &pasid);
if (rc < 0)
@@ -682,6 +687,14 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index)
irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
return rc;
}
+
+ producer->token = trigger;
+ producer->irq = irq_entry->irq;
+ rc = irq_bypass_register_producer(producer);
+ if (unlikely(rc))
+ dev_info(dev, "irq bypass producer (token %p) registration failed: %d\n",
+ producer->token, rc);
+
irq_entry->irq_set = true;
}
diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h
index 8421b4962ac7..1f867de416e7 100644
--- a/drivers/vfio/mdev/idxd/mdev.h
+++ b/drivers/vfio/mdev/idxd/mdev.h
@@ -45,6 +45,7 @@ struct idxd_vdev {
struct mdev_device *mdev;
struct vfio_group *vfio_group;
struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES];
+ struct irq_bypass_producer producer[VIDXD_MAX_MSIX_ENTRIES];
};
struct vdcm_idxd {
Add emulation routines for PCI config read/write, MMIO read/write, and
interrupt handling routine for the emulated device. The rw routines are
called when PCI config read/writes or BAR0 mmio read/writes and being
issued by the guest kernel through KVM/qemu.
Because we are supporting read-only configuration, most of the MMIO
emulations are simple memory copy except for cases such as handling device
commands and interrupts.
As part of emulation code, add the support code for "1dwq" mdev type.
This mdev type follows the standard VFIO mdev flow. The "1dwq" type will
export a single dedicated wq to the mdev. The dwq will have read-only
configuration that is configured by the host. The mdev type does not
support PASID and SVA and will match the stage 1 driver in functional
support. For backward compatibility, the mdev will maintain the DSA
spec definition of this mdev type once the commit goes upstream.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/registers.h | 10 +
drivers/vfio/mdev/idxd/vdev.c | 456 ++++++++++++++++++++++++++++++++++++++++-
drivers/vfio/mdev/idxd/vdev.h | 8 +
include/uapi/linux/idxd.h | 2
4 files changed, 468 insertions(+), 8 deletions(-)
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index d9a732decdd5..50ea94259c99 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -195,7 +195,8 @@ union cmdsts_reg {
};
u32 bits;
} __packed;
-#define IDXD_CMDSTS_ACTIVE 0x80000000
+#define IDXD_CMDS_ACTIVE_BIT 31
+#define IDXD_CMDSTS_ACTIVE BIT(IDXD_CMDS_ACTIVE_BIT)
#define IDXD_CMDSTS_ERR_MASK 0xff
#define IDXD_CMDSTS_RES_SHIFT 8
@@ -278,6 +279,11 @@ union msix_perm {
u32 bits;
} __packed;
+#define IDXD_MSIX_PERM_MASK 0xfffff00c
+#define IDXD_MSIX_PERM_IGNORE 0x3
+#define MSIX_ENTRY_MASK_INT 0x1
+#define MSIX_ENTRY_CTRL_BYTE 12
+
union group_flags {
struct {
u32 tc_a:3;
@@ -349,6 +355,8 @@ union wqcfg {
#define WQCFG_PASID_IDX 2
#define WQCFG_PRIV_IDX 2
+#define WQCFG_MODE_DEDICATED 1
+#define WQCFG_MODE_SHARED 0
/*
* This macro calculates the offset into the WQCFG register
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 766753a2ec53..958b09987e5c 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -25,35 +25,472 @@
int vidxd_send_interrupt(struct ims_irq_entry *iie)
{
- /* PLACE HOLDER */
+ struct vdcm_idxd *vidxd = iie->vidxd;
+ struct device *dev = &vidxd->idxd->pdev->dev;
+ int rc;
+
+ dev_dbg(dev, "%s interrput %d\n", __func__, iie->id);
+
+ if (!vidxd->vdev.msix_trigger[iie->id]) {
+ dev_warn(dev, "%s: intr eventfd not found %d\n", __func__, iie->id);
+ return -EINVAL;
+ }
+
+ rc = eventfd_signal(vidxd->vdev.msix_trigger[iie->id], 1);
+ if (rc != 1)
+ dev_err(dev, "eventfd signal failed: %d on wq(%d) vector(%d)\n",
+ vidxd->wq->id, iie->id, rc);
+ else
+ dev_dbg(dev, "vidxd interrupt triggered wq(%d) %d\n", vidxd->wq->id, iie->id);
+
+ return rc;
+}
+
+static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error)
+{
+ u8 *bar0 = vidxd->bar0;
+ union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET);
+ union genctrl_reg *genctrl;
+ bool send = false;
+
+ if (!swerr->valid) {
+ memset(swerr, 0, sizeof(*swerr));
+ swerr->valid = 1;
+ swerr->error = error;
+ send = true;
+ } else if (swerr->valid && !swerr->overflow) {
+ swerr->overflow = 1;
+ }
+
+ genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET);
+ if (send && genctrl->softerr_int_en) {
+ u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET);
+
+ *intcause |= IDXD_INTC_ERR;
+ vidxd_send_interrupt(&vidxd->irq_entries[0]);
+ }
+}
+
+int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
+{
+ u32 offset = pos & (vidxd->bar_size[0] - 1);
+ u8 *bar0 = vidxd->bar0;
+ struct device *dev = mdev_dev(vidxd->vdev.mdev);
+
+ dev_dbg(dev, "vidxd mmio W %d %x %x: %llx\n", vidxd->wq->id, size,
+ offset, get_reg_val(buf, size));
+
+ if (((size & (size - 1)) != 0) || (offset & (size - 1)) != 0)
+ return -EINVAL;
+
+ /* If we don't limit this, we potentially can write out of bound */
+ if (size > sizeof(u32))
+ return -EINVAL;
+
+ switch (offset) {
+ case IDXD_GENCFG_OFFSET ... IDXD_GENCFG_OFFSET + 3:
+ /* Write only when device is disabled. */
+ if (vidxd_state(vidxd) == IDXD_DEVICE_STATE_DISABLED)
+ memcpy(bar0 + offset, buf, size);
+ break;
+
+ case IDXD_GENCTRL_OFFSET:
+ memcpy(bar0 + offset, buf, size);
+ break;
+
+ case IDXD_INTCAUSE_OFFSET:
+ bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(4, 0));
+ break;
+
+ case IDXD_CMD_OFFSET: {
+ u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET);
+ u32 val = get_reg_val(buf, size);
+
+ if (size != sizeof(u32))
+ return -EINVAL;
+
+ /* Check and set command in progress */
+ if (test_and_set_bit(IDXD_CMDS_ACTIVE_BIT, (unsigned long *)cmdsts) == 0)
+ vidxd_do_command(vidxd, val);
+ else
+ vidxd_report_error(vidxd, DSA_ERR_CMD_REG);
+ break;
+ }
+
+ case IDXD_SWERR_OFFSET:
+ /* W1C */
+ bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(1, 0));
+ break;
+
+ case VIDXD_WQCFG_OFFSET ... VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ - 1:
+ case VIDXD_GRPCFG_OFFSET ... VIDXD_GRPCFG_OFFSET + VIDXD_GRP_CTRL_SZ - 1:
+ /* Nothing is written. Should be all RO */
+ break;
+
+ case VIDXD_MSIX_TABLE_OFFSET ... VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: {
+ int index = (offset - VIDXD_MSIX_TABLE_OFFSET) / 0x10;
+ u8 *msix_entry = &bar0[VIDXD_MSIX_TABLE_OFFSET + index * 0x10];
+ u64 *pba = (u64 *)(bar0 + VIDXD_MSIX_PBA_OFFSET);
+ u8 ctrl;
+
+ ctrl = msix_entry[MSIX_ENTRY_CTRL_BYTE];
+ memcpy(bar0 + offset, buf, size);
+ /* Handle clearing of UNMASK bit */
+ if (!(msix_entry[MSIX_ENTRY_CTRL_BYTE] & MSIX_ENTRY_MASK_INT) &&
+ ctrl & MSIX_ENTRY_MASK_INT)
+ if (test_and_clear_bit(index, (unsigned long *)pba))
+ vidxd_send_interrupt(&vidxd->irq_entries[index]);
+ break;
+ }
+
+ case VIDXD_MSIX_PERM_OFFSET ... VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1:
+ memcpy(bar0 + offset, buf, size);
+ break;
+ } /* offset */
+
return 0;
}
int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
{
- /* PLACEHOLDER */
+ u32 offset = pos & (vidxd->bar_size[0] - 1);
+ struct device *dev = mdev_dev(vidxd->vdev.mdev);
+
+ memcpy(buf, vidxd->bar0 + offset, size);
+
+ dev_dbg(dev, "vidxd mmio R %d %x %x: %llx\n",
+ vidxd->wq->id, size, offset, get_reg_val(buf, size));
return 0;
}
-int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
+int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count)
{
- /* PLACEHOLDER */
+ u32 offset = pos & 0xfff;
+ struct device *dev = mdev_dev(vidxd->vdev.mdev);
+
+ memcpy(buf, &vidxd->cfg[offset], count);
+
+ dev_dbg(dev, "vidxd pci R %d %x %x: %llx\n",
+ vidxd->wq->id, count, offset, get_reg_val(buf, count));
+
return 0;
}
-int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count)
+/*
+ * Much of the emulation code has been borrowed from Intel i915 cfg space
+ * emulation code.
+ * drivers/gpu/drm/i915/gvt/cfg_space.c:
+ */
+
+/*
+ * Bitmap for writable bits (RW or RW1C bits, but cannot co-exist in one
+ * byte) byte by byte in standard pci configuration space. (not the full
+ * 256 bytes.)
+ */
+static const u8 pci_cfg_space_rw_bmp[PCI_INTERRUPT_LINE + 4] = {
+ [PCI_COMMAND] = 0xff, 0x07,
+ [PCI_STATUS] = 0x00, 0xf9, /* the only one RW1C byte */
+ [PCI_CACHE_LINE_SIZE] = 0xff,
+ [PCI_BASE_ADDRESS_0 ... PCI_CARDBUS_CIS - 1] = 0xff,
+ [PCI_ROM_ADDRESS] = 0x01, 0xf8, 0xff, 0xff,
+ [PCI_INTERRUPT_LINE] = 0xff,
+};
+
+static void _pci_cfg_mem_write(struct vdcm_idxd *vidxd, unsigned int off, u8 *src,
+ unsigned int bytes)
{
- /* PLACEHOLDER */
+ u8 *cfg_base = vidxd->cfg;
+ u8 mask, new, old;
+ int i = 0;
+
+ for (; i < bytes && (off + i < sizeof(pci_cfg_space_rw_bmp)); i++) {
+ mask = pci_cfg_space_rw_bmp[off + i];
+ old = cfg_base[off + i];
+ new = src[i] & mask;
+
+ /**
+ * The PCI_STATUS high byte has RW1C bits, here
+ * emulates clear by writing 1 for these bits.
+ * Writing a 0b to RW1C bits has no effect.
+ */
+ if (off + i == PCI_STATUS + 1)
+ new = (~new & old) & mask;
+
+ cfg_base[off + i] = (old & ~mask) | new;
+ }
+
+ /* For other configuration space directly copy as it is. */
+ if (i < bytes)
+ memcpy(cfg_base + off + i, src + i, bytes - i);
+}
+
+static inline void _write_pci_bar(struct vdcm_idxd *vidxd, u32 offset, u32 val, bool low)
+{
+ u32 *pval;
+
+ /* BAR offset should be 32 bits algiend */
+ offset = rounddown(offset, 4);
+ pval = (u32 *)(vidxd->cfg + offset);
+
+ if (low) {
+ /*
+ * only update bit 31 - bit 4,
+ * leave the bit 3 - bit 0 unchanged.
+ */
+ *pval = (val & GENMASK(31, 4)) | (*pval & GENMASK(3, 0));
+ } else {
+ *pval = val;
+ }
+}
+
+static int _pci_cfg_bar_write(struct vdcm_idxd *vidxd, unsigned int offset, void *p_data,
+ unsigned int bytes)
+{
+ u32 new = *(u32 *)(p_data);
+ bool lo = IS_ALIGNED(offset, 8);
+ u64 size;
+ unsigned int bar_id;
+
+ /*
+ * Power-up software can determine how much address
+ * space the device requires by writing a value of
+ * all 1's to the register and then reading the value
+ * back. The device will return 0's in all don't-care
+ * address bits.
+ */
+ if (new == 0xffffffff) {
+ switch (offset) {
+ case PCI_BASE_ADDRESS_0:
+ case PCI_BASE_ADDRESS_1:
+ case PCI_BASE_ADDRESS_2:
+ case PCI_BASE_ADDRESS_3:
+ bar_id = (offset - PCI_BASE_ADDRESS_0) / 8;
+ size = vidxd->bar_size[bar_id];
+ _write_pci_bar(vidxd, offset, size >> (lo ? 0 : 32), lo);
+ break;
+ default:
+ /* Unimplemented BARs */
+ _write_pci_bar(vidxd, offset, 0x0, false);
+ }
+ } else {
+ switch (offset) {
+ case PCI_BASE_ADDRESS_0:
+ case PCI_BASE_ADDRESS_1:
+ case PCI_BASE_ADDRESS_2:
+ case PCI_BASE_ADDRESS_3:
+ _write_pci_bar(vidxd, offset, new, lo);
+ break;
+ default:
+ break;
+ }
+ }
return 0;
}
int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size)
{
- /* PLACEHOLDER */
+ struct device *dev = &vidxd->idxd->pdev->dev;
+
+ if (size > 4)
+ return -EINVAL;
+
+ if (pos + size > VIDXD_MAX_CFG_SPACE_SZ)
+ return -EINVAL;
+
+ dev_dbg(dev, "vidxd pci W %d %x %x: %llx\n", vidxd->wq->id, size, pos,
+ get_reg_val(buf, size));
+
+ /* First check if it's PCI_COMMAND */
+ if (IS_ALIGNED(pos, 2) && pos == PCI_COMMAND) {
+ bool new_bme;
+ bool bme;
+
+ if (size > 2)
+ return -EINVAL;
+
+ new_bme = !!(get_reg_val(buf, 2) & PCI_COMMAND_MASTER);
+ bme = !!(vidxd->cfg[pos] & PCI_COMMAND_MASTER);
+ _pci_cfg_mem_write(vidxd, pos, buf, size);
+
+ /* Flag error if turning off BME while device is enabled */
+ if ((bme && !new_bme) && vidxd_state(vidxd) == IDXD_DEVICE_STATE_ENABLED)
+ vidxd_report_error(vidxd, DSA_ERR_PCI_CFG);
+ return 0;
+ }
+
+ switch (pos) {
+ case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5:
+ if (!IS_ALIGNED(pos, 4))
+ return -EINVAL;
+ return _pci_cfg_bar_write(vidxd, pos, buf, size);
+
+ default:
+ _pci_cfg_mem_write(vidxd, pos, buf, size);
+ }
return 0;
}
+static void vidxd_mmio_init_grpcap(struct vdcm_idxd *vidxd)
+{
+ u8 *bar0 = vidxd->bar0;
+ union group_cap_reg *grp_cap = (union group_cap_reg *)(bar0 + IDXD_GRPCAP_OFFSET);
+
+ /* single group for current implementation */
+ grp_cap->token_en = 0;
+ grp_cap->token_limit = 0;
+ grp_cap->total_tokens = 0;
+ grp_cap->num_groups = 1;
+}
+
+static void vidxd_mmio_init_grpcfg(struct vdcm_idxd *vidxd)
+{
+ u8 *bar0 = vidxd->bar0;
+ struct grpcfg *grpcfg = (struct grpcfg *)(bar0 + VIDXD_GRPCFG_OFFSET);
+ struct idxd_wq *wq = vidxd->wq;
+ struct idxd_group *group = wq->group;
+ int i;
+
+ /*
+ * At this point, we are only exporting a single workqueue for
+ * each mdev. So we need to just fake it as first workqueue
+ * and also mark the available engines in this group.
+ */
+
+ /* Set single workqueue and the first one */
+ grpcfg->wqs[0] = BIT(0);
+ grpcfg->engines = 0;
+ for (i = 0; i < group->num_engines; i++)
+ grpcfg->engines |= BIT(i);
+ grpcfg->flags.bits = group->grpcfg.flags.bits;
+}
+
+static void vidxd_mmio_init_wqcap(struct vdcm_idxd *vidxd)
+{
+ u8 *bar0 = vidxd->bar0;
+ struct idxd_wq *wq = vidxd->wq;
+ union wq_cap_reg *wq_cap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET);
+
+ wq_cap->occupancy_int = 0;
+ wq_cap->occupancy = 0;
+ wq_cap->priority = 0;
+ wq_cap->total_wq_size = wq->size;
+ wq_cap->num_wqs = VIDXD_MAX_WQS;
+ wq_cap->wq_ats_support = 0;
+ wq_cap->dedicated_mode = 1;
+ wq_cap->shared_mode = 0;
+}
+
+static void vidxd_mmio_init_wqcfg(struct vdcm_idxd *vidxd)
+{
+ struct idxd_device *idxd = vidxd->idxd;
+ struct idxd_wq *wq = vidxd->wq;
+ u8 *bar0 = vidxd->bar0;
+ union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+
+ wqcfg->wq_size = wq->size;
+ wqcfg->wq_thresh = wq->threshold;
+
+ wqcfg->mode = WQCFG_MODE_DEDICATED;
+
+ wqcfg->bof = 0;
+
+ wqcfg->priority = wq->priority;
+ wqcfg->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift;
+ wqcfg->max_batch_shift = idxd->hw.gen_cap.max_batch_shift;
+ /* make mode change read-only */
+ wqcfg->mode_support = 0;
+}
+
+static void vidxd_mmio_init_engcap(struct vdcm_idxd *vidxd)
+{
+ u8 *bar0 = vidxd->bar0;
+ union engine_cap_reg *engcap = (union engine_cap_reg *)(bar0 + IDXD_ENGCAP_OFFSET);
+ struct idxd_wq *wq = vidxd->wq;
+ struct idxd_group *group = wq->group;
+
+ engcap->num_engines = group->num_engines;
+}
+
+static void vidxd_mmio_init_gencap(struct vdcm_idxd *vidxd)
+{
+ struct idxd_device *idxd = vidxd->idxd;
+ u8 *bar0 = vidxd->bar0;
+ union gen_cap_reg *gencap = (union gen_cap_reg *)(bar0 + IDXD_GENCAP_OFFSET);
+
+ gencap->bits = idxd->hw.gen_cap.bits;
+ gencap->config_en = 0;
+ gencap->max_ims_mult = 0;
+ gencap->cmd_cap = 1;
+ gencap->block_on_fault = 0;
+}
+
+static void vidxd_mmio_init_cmdcap(struct vdcm_idxd *vidxd)
+{
+ struct idxd_device *idxd = vidxd->idxd;
+ u8 *bar0 = vidxd->bar0;
+ u32 *cmdcap = (u32 *)(bar0 + IDXD_CMDCAP_OFFSET);
+
+ if (idxd->hw.cmd_cap)
+ *cmdcap = idxd->hw.cmd_cap;
+ else
+ *cmdcap = 0x1ffe;
+
+ *cmdcap |= BIT(IDXD_CMD_REQUEST_INT_HANDLE) | BIT(IDXD_CMD_RELEASE_INT_HANDLE);
+}
+
+static void vidxd_mmio_init_opcap(struct vdcm_idxd *vidxd)
+{
+ u64 opcode;
+ u8 *bar0 = vidxd->bar0;
+ u64 *opcap = (u64 *)(bar0 + IDXD_OPCAP_OFFSET);
+
+ opcode = BIT_ULL(DSA_OPCODE_NOOP) | BIT_ULL(DSA_OPCODE_BATCH) |
+ BIT_ULL(DSA_OPCODE_DRAIN) | BIT_ULL(DSA_OPCODE_MEMMOVE) |
+ BIT_ULL(DSA_OPCODE_MEMFILL) | BIT_ULL(DSA_OPCODE_COMPARE) |
+ BIT_ULL(DSA_OPCODE_COMPVAL) | BIT_ULL(DSA_OPCODE_CR_DELTA) |
+ BIT_ULL(DSA_OPCODE_AP_DELTA) | BIT_ULL(DSA_OPCODE_DUALCAST) |
+ BIT_ULL(DSA_OPCODE_CRCGEN) | BIT_ULL(DSA_OPCODE_COPY_CRC) |
+ BIT_ULL(DSA_OPCODE_DIF_CHECK) | BIT_ULL(DSA_OPCODE_DIF_INS) |
+ BIT_ULL(DSA_OPCODE_DIF_STRP) | BIT_ULL(DSA_OPCODE_DIF_UPDT) |
+ BIT_ULL(DSA_OPCODE_CFLUSH);
+ *opcap = opcode;
+}
+
+static void vidxd_mmio_init_version(struct vdcm_idxd *vidxd)
+{
+ struct idxd_device *idxd = vidxd->idxd;
+ u32 *version;
+
+ version = (u32 *)vidxd->bar0;
+ *version = idxd->hw.version;
+}
+
void vidxd_mmio_init(struct vdcm_idxd *vidxd)
+{
+ u8 *bar0 = vidxd->bar0;
+ union offsets_reg *offsets;
+
+ memset(vidxd->bar0, 0, VIDXD_BAR0_SIZE);
+
+ vidxd_mmio_init_version(vidxd);
+ vidxd_mmio_init_gencap(vidxd);
+ vidxd_mmio_init_wqcap(vidxd);
+ vidxd_mmio_init_grpcap(vidxd);
+ vidxd_mmio_init_engcap(vidxd);
+ vidxd_mmio_init_opcap(vidxd);
+
+ offsets = (union offsets_reg *)(bar0 + IDXD_TABLE_OFFSET);
+ offsets->grpcfg = VIDXD_GRPCFG_OFFSET / 0x100;
+ offsets->wqcfg = VIDXD_WQCFG_OFFSET / 0x100;
+ offsets->msix_perm = VIDXD_MSIX_PERM_OFFSET / 0x100;
+
+ vidxd_mmio_init_cmdcap(vidxd);
+ memset(bar0 + VIDXD_MSIX_PERM_OFFSET, 0, VIDXD_MSIX_PERM_TBL_SZ);
+ vidxd_mmio_init_grpcfg(vidxd);
+ vidxd_mmio_init_wqcfg(vidxd);
+}
+
+static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val)
{
/* PLACEHOLDER */
}
@@ -63,6 +500,11 @@ void vidxd_reset(struct vdcm_idxd *vidxd)
/* PLACEHOLDER */
}
+void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
+{
+ /* PLACEHOLDER */
+}
+
int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
{
/* PLACEHOLDER */
diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
index cc2ba6ccff7b..fc0f405baa40 100644
--- a/drivers/vfio/mdev/idxd/vdev.h
+++ b/drivers/vfio/mdev/idxd/vdev.h
@@ -6,6 +6,13 @@
#include "mdev.h"
+static inline u8 vidxd_state(struct vdcm_idxd *vidxd)
+{
+ union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET);
+
+ return gensts->state;
+}
+
int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count);
@@ -15,5 +22,6 @@ void vidxd_reset(struct vdcm_idxd *vidxd);
int vidxd_send_interrupt(struct ims_irq_entry *iie);
int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);
+void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val);
#endif
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 236d437947bc..22d1b229a912 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -89,6 +89,8 @@ enum dsa_completion_status {
DSA_COMP_HW_ERR1,
DSA_COMP_HW_ERR_DRB,
DSA_COMP_TRANSLATION_FAIL,
+ DSA_ERR_PCI_CFG = 0x51,
+ DSA_ERR_CMD_REG,
};
enum iax_completion_status {
Add mdev device type "1dwq-v1" support code. 1dwq-v1 is defined as a
single DSA gen1 dedicated WQ. This WQ cannot be shared between guests. The
guest also cannot change any WQ configuration.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/sysfs.c | 1
drivers/vfio/mdev/idxd/mdev.c | 216 +++++++++++++++++++++++++++++++++++++++--
2 files changed, 207 insertions(+), 10 deletions(-)
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 13d20cbd4cf6..d985a0ac23d9 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -84,6 +84,7 @@ inline bool is_idxd_wq_mdev(struct idxd_wq *wq)
{
return wq->type == IDXD_WQT_MDEV ? true : false;
}
+EXPORT_SYMBOL_GPL(is_idxd_wq_mdev);
static int idxd_config_bus_match(struct device *dev,
struct device_driver *drv)
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 384ba5d6bc2b..7529396f3812 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -46,6 +46,9 @@ static u64 idxd_pci_config[] = {
0x0000000000000000ULL,
};
+static char idxd_dsa_1dwq_name[IDXD_MDEV_NAME_LEN];
+static char idxd_iax_1dwq_name[IDXD_MDEV_NAME_LEN];
+
static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index,
unsigned int start, unsigned int count, void *data);
@@ -144,21 +147,70 @@ static void idxd_vdcm_release(struct mdev_device *mdev)
mutex_unlock(&vidxd->dev_lock);
}
+static struct idxd_wq *find_any_dwq(struct idxd_device *idxd, struct vdcm_idxd_type *type)
+{
+ int i;
+ struct idxd_wq *wq;
+ unsigned long flags;
+
+ switch (type->type) {
+ case IDXD_MDEV_TYPE_DSA_1_DWQ:
+ if (idxd->type != IDXD_TYPE_DSA)
+ return NULL;
+ break;
+ case IDXD_MDEV_TYPE_IAX_1_DWQ:
+ if (idxd->type != IDXD_TYPE_IAX)
+ return NULL;
+ break;
+ default:
+ return NULL;
+ }
+
+ spin_lock_irqsave(&idxd->dev_lock, flags);
+ for (i = 0; i < idxd->max_wqs; i++) {
+ wq = &idxd->wqs[i];
+
+ if (wq->state != IDXD_WQ_ENABLED)
+ continue;
+
+ if (!wq_dedicated(wq))
+ continue;
+
+ if (idxd_wq_refcount(wq) != 0)
+ continue;
+
+ spin_unlock_irqrestore(&idxd->dev_lock, flags);
+ mutex_lock(&wq->wq_lock);
+ if (idxd_wq_refcount(wq)) {
+ spin_lock_irqsave(&idxd->dev_lock, flags);
+ continue;
+ }
+
+ idxd_wq_get(wq);
+ mutex_unlock(&wq->wq_lock);
+ return wq;
+ }
+
+ spin_unlock_irqrestore(&idxd->dev_lock, flags);
+ return NULL;
+}
+
static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev,
struct vdcm_idxd_type *type)
{
struct vdcm_idxd *vidxd;
struct idxd_wq *wq = NULL;
- int i;
-
- /* PLACEHOLDER, wq matching comes later */
+ int i, rc;
+ wq = find_any_dwq(idxd, type);
if (!wq)
return ERR_PTR(-ENODEV);
vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL);
- if (!vidxd)
- return ERR_PTR(-ENOMEM);
+ if (!vidxd) {
+ rc = -ENOMEM;
+ goto err;
+ }
mutex_init(&vidxd->dev_lock);
vidxd->idxd = idxd;
@@ -169,9 +221,6 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev
vidxd->num_wqs = VIDXD_MAX_WQS;
idxd_vdcm_init(vidxd);
- mutex_lock(&wq->wq_lock);
- idxd_wq_get(wq);
- mutex_unlock(&wq->wq_lock);
for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
vidxd->irq_entries[i].vidxd = vidxd;
@@ -179,9 +228,24 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev
}
return vidxd;
+
+ err:
+ mutex_lock(&wq->wq_lock);
+ idxd_wq_put(wq);
+ mutex_unlock(&wq->wq_lock);
+ return ERR_PTR(rc);
}
-static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES];
+static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = {
+ {
+ .name = idxd_dsa_1dwq_name,
+ .type = IDXD_MDEV_TYPE_DSA_1_DWQ,
+ },
+ {
+ .name = idxd_iax_1dwq_name,
+ .type = IDXD_MDEV_TYPE_IAX_1_DWQ,
+ },
+};
static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev,
const char *name)
@@ -965,7 +1029,94 @@ static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd,
return rc;
}
-static const struct mdev_parent_ops idxd_vdcm_ops = {
+static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ struct vdcm_idxd_type *type;
+
+ type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj));
+
+ if (type)
+ return sprintf(buf, "%s\n", type->name);
+
+ return -EINVAL;
+}
+static MDEV_TYPE_ATTR_RO(name);
+
+static int find_available_mdev_instances(struct idxd_device *idxd, struct vdcm_idxd_type *type)
+{
+ int count = 0, i;
+ unsigned long flags;
+
+ switch (type->type) {
+ case IDXD_MDEV_TYPE_DSA_1_DWQ:
+ if (idxd->type != IDXD_TYPE_DSA)
+ return 0;
+ break;
+ case IDXD_MDEV_TYPE_IAX_1_DWQ:
+ if (idxd->type != IDXD_TYPE_IAX)
+ return 0;
+ break;
+ default:
+ return 0;
+ }
+
+ spin_lock_irqsave(&idxd->dev_lock, flags);
+ for (i = 0; i < idxd->max_wqs; i++) {
+ struct idxd_wq *wq;
+
+ wq = &idxd->wqs[i];
+ if (!is_idxd_wq_mdev(wq) || !wq_dedicated(wq) || idxd_wq_refcount(wq))
+ continue;
+
+ count++;
+ }
+ spin_unlock_irqrestore(&idxd->dev_lock, flags);
+
+ return count;
+}
+
+static ssize_t available_instances_show(struct kobject *kobj,
+ struct device *dev, char *buf)
+{
+ int count;
+ struct idxd_device *idxd = dev_get_drvdata(dev);
+ struct vdcm_idxd_type *type;
+
+ type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj));
+ if (!type)
+ return -EINVAL;
+
+ count = find_available_mdev_instances(idxd, type);
+
+ return sprintf(buf, "%d\n", count);
+}
+static MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+ char *buf)
+{
+ return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
+}
+static MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *idxd_mdev_types_attrs[] = {
+ &mdev_type_attr_name.attr,
+ &mdev_type_attr_device_api.attr,
+ &mdev_type_attr_available_instances.attr,
+ NULL,
+};
+
+static struct attribute_group idxd_mdev_type_dsa_group0 = {
+ .name = idxd_dsa_1dwq_name,
+ .attrs = idxd_mdev_types_attrs,
+};
+
+static struct attribute_group idxd_mdev_type_iax_group0 = {
+ .name = idxd_iax_1dwq_name,
+ .attrs = idxd_mdev_types_attrs,
+};
+
+static struct mdev_parent_ops idxd_vdcm_ops = {
.create = idxd_vdcm_create,
.remove = idxd_vdcm_remove,
.open = idxd_vdcm_open,
@@ -976,6 +1127,43 @@ static const struct mdev_parent_ops idxd_vdcm_ops = {
.ioctl = idxd_vdcm_ioctl,
};
+/* Set the mdev type version to the hardware version supported */
+static void init_mdev_1dwq_name(struct idxd_device *idxd)
+{
+ unsigned int version;
+
+ version = (idxd->hw.version & GENMASK(15, 8)) >> 8;
+ if (idxd->type == IDXD_TYPE_DSA && strlen(idxd_dsa_1dwq_name) == 0)
+ sprintf(idxd_dsa_1dwq_name, "dsa-1dwq-v%u", version);
+ else if (idxd->type == IDXD_TYPE_IAX && strlen(idxd_iax_1dwq_name) == 0)
+ sprintf(idxd_iax_1dwq_name, "iax-1dwq-v%u", version);
+}
+
+static int alloc_supported_types(struct idxd_device *idxd)
+{
+ struct attribute_group **idxd_mdev_type_groups;
+
+ idxd_mdev_type_groups = kcalloc(2, sizeof(struct attribute_group *), GFP_KERNEL);
+ if (!idxd_mdev_type_groups)
+ return -ENOMEM;
+
+ switch (idxd->type) {
+ case IDXD_TYPE_DSA:
+ idxd_mdev_type_groups[0] = &idxd_mdev_type_dsa_group0;
+ break;
+ case IDXD_TYPE_IAX:
+ idxd_mdev_type_groups[0] = &idxd_mdev_type_iax_group0;
+ break;
+ case IDXD_TYPE_UNKNOWN:
+ default:
+ return -ENODEV;
+ }
+
+ idxd_vdcm_ops.supported_type_groups = idxd_mdev_type_groups;
+
+ return 0;
+}
+
int idxd_mdev_host_init(struct idxd_device *idxd)
{
struct device *dev = &idxd->pdev->dev;
@@ -984,6 +1172,11 @@ int idxd_mdev_host_init(struct idxd_device *idxd)
if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags))
return -EOPNOTSUPP;
+ init_mdev_1dwq_name(idxd);
+ rc = alloc_supported_types(idxd);
+ if (rc < 0)
+ return rc;
+
if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX);
if (rc < 0) {
@@ -1010,6 +1203,9 @@ void idxd_mdev_host_release(struct idxd_device *idxd)
dev_warn(dev, "Failed to disable aux-domain: %d\n",
rc);
}
+
+ kfree(idxd_vdcm_ops.supported_type_groups);
+ idxd_vdcm_ops.supported_type_groups = NULL;
}
static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
Add "mdev" wq type and support helpers. The mdev wq type marks the wq
to be utilized as a VFIO mediated device.
Signed-off-by: Dave Jiang <[email protected]>
---
drivers/dma/idxd/idxd.h | 2 ++
drivers/dma/idxd/sysfs.c | 13 +++++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a271942df2be..67428c8d476d 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -73,6 +73,7 @@ enum idxd_wq_type {
IDXD_WQT_NONE = 0,
IDXD_WQT_KERNEL,
IDXD_WQT_USER,
+ IDXD_WQT_MDEV,
};
struct idxd_cdev {
@@ -344,6 +345,7 @@ void idxd_cleanup_sysfs(struct idxd_device *idxd);
int idxd_register_driver(void);
void idxd_unregister_driver(void);
struct bus_type *idxd_get_bus_type(struct idxd_device *idxd);
+bool is_idxd_wq_mdev(struct idxd_wq *wq);
/* device interrupt control */
irqreturn_t idxd_irq_handler(int vec, void *data);
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index ab5c76e1226b..13d20cbd4cf6 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -14,6 +14,7 @@ static char *idxd_wq_type_names[] = {
[IDXD_WQT_NONE] = "none",
[IDXD_WQT_KERNEL] = "kernel",
[IDXD_WQT_USER] = "user",
+ [IDXD_WQT_MDEV] = "mdev",
};
static void idxd_conf_device_release(struct device *dev)
@@ -79,6 +80,11 @@ static inline bool is_idxd_wq_cdev(struct idxd_wq *wq)
return wq->type == IDXD_WQT_USER;
}
+inline bool is_idxd_wq_mdev(struct idxd_wq *wq)
+{
+ return wq->type == IDXD_WQT_MDEV ? true : false;
+}
+
static int idxd_config_bus_match(struct device *dev,
struct device_driver *drv)
{
@@ -1151,8 +1157,9 @@ static ssize_t wq_type_show(struct device *dev,
return sprintf(buf, "%s\n",
idxd_wq_type_names[IDXD_WQT_KERNEL]);
case IDXD_WQT_USER:
- return sprintf(buf, "%s\n",
- idxd_wq_type_names[IDXD_WQT_USER]);
+ return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]);
+ case IDXD_WQT_MDEV:
+ return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]);
case IDXD_WQT_NONE:
default:
return sprintf(buf, "%s\n",
@@ -1179,6 +1186,8 @@ static ssize_t wq_type_store(struct device *dev,
wq->type = IDXD_WQT_KERNEL;
else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER]))
wq->type = IDXD_WQT_USER;
+ else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV]))
+ wq->type = IDXD_WQT_MDEV;
else
return -EINVAL;
Add idxd vfio mediated device theory of operation documentation.
Provide description on mdev design, usage, and why vfio mdev was chosen.
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
---
Documentation/driver-api/vfio/mdev-idxd.rst | 397 +++++++++++++++++++++++++++
MAINTAINERS | 1
2 files changed, 398 insertions(+)
create mode 100644 Documentation/driver-api/vfio/mdev-idxd.rst
diff --git a/Documentation/driver-api/vfio/mdev-idxd.rst b/Documentation/driver-api/vfio/mdev-idxd.rst
new file mode 100644
index 000000000000..9bf93eafc7c8
--- /dev/null
+++ b/Documentation/driver-api/vfio/mdev-idxd.rst
@@ -0,0 +1,397 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+IDXD Overview
+=============
+IDXD (Intel Data Accelerator Driver) is the driver for the Intel Data
+Streaming Accelerator (DSA). Intel DSA is a high performance data copy
+and transformation accelerator. In addition to data move operations,
+the device also supports data fill, CRC generation, Data Integrity Field
+(DIF), and memory compare and delta generation. Intel DSA supports
+a variety of PCI-SIG defined capabilities such as Address Translation
+Services (ATS), Process address Space ID (PASID), Page Request Interface
+(PRI), Message Signalled Interrupts Extended (MSI-X), and Advanced Error
+Reporting (AER). Some of those capabilities enable the device to support
+Shared Virtual Memory (SVM), or also known as Shared Virtual Addressing
+(SVA). Intel DSA also supports Intel Scalable I/O Virtualization (SIOV)
+to improve scalability of device assignment.
+
+
+The Intel DSA device contains the following basic components:
+* Work queue (WQ)
+
+ A WQ is an on device storage to queue descriptors to the
+ device. Requests are added to a WQ by using new CPU instructions
+ (MOVDIR64B and ENQCMD(S)) to write the memory mapped “portal”
+ associated with each WQ.
+
+* Engine
+
+ Operation unit that pulls descriptors from WQs and processes them.
+
+* Group
+
+ Abstract container to associate one or more engines with one or more WQs.
+
+
+Two types of WQs are supported:
+* Dedicated WQ (DWQ)
+
+ A single client should owns this exclusively and can submit work
+ to it. The MOVDIR64B instruction is used to submit descriptors to
+ this type of WQ. The instruction is a posted write, therefore the
+ submitter must ensure not exceed the WQ length for submission. The
+ use of PASID is optional with DWQ. Multiple clients can submit to
+ a DWQ, but sychronization is required due to when the WQ is full,
+ the submission is silently dropped.
+
+* Shared WQ (SWQ)
+
+ Multiple clients can submit work to this WQ. The submitter must use
+ ENQMCDS (from supervisor mode) or ENQCMD (from user mode). These
+ instructions will indicate via EFLAGS.ZF bit whether a submission
+ succeeds. The use of PASID is mandatory to identify the address space
+ of each client.
+
+
+For more information about the new instructions [1][2].
+
+The IDXD driver is broken down into following usages:
+* In kernel interface through dmaengine subsystem API.
+* Userspace DMA support through character device. mmap(2) is utilized
+ to map directly to mmio address (or portals) for descriptor submission.
+* VFIO Mediated device (mdev) supporting device passthrough usages. This
+ is only for the mdev usage.
+
+
+=================================
+Assignable Device Interface (ADI)
+=================================
+The term ADI is used to represent the minimal unit of assignment for
+Intel Scalable IOV device. Each ADI instance refers to the set of device
+backend resources that are allocated, configured and organized as an
+isolated unit.
+
+Intel DSA defines each WQ as an ADI. The MMIO registers of each work queue
+are partitioned into two categories:
+* MMIO registers accessed for data-path operations.
+* MMIO registers accessed for control-path operations.
+
+Data-path MMIO registers of each WQ are contained within
+one or more system page size aligned regions and can be mapped in the
+CPU page table for direct access from the guest. Control-path MMIO
+registers of all WQs are located together but segregated from data-path
+MMIO regions. Therefore, guest updates to control-path registers must
+be intercepted and then go through the host driver to be reflected in
+the device.
+
+Data-path MMIO registers of DSA WQ are portals for submitting descriptors
+to the device. There are four portals per WQ, each being 64 bytes
+in size and located on a separate 4KB page in BAR2. Each portal has
+different implications regarding interrupt message type (MSI vs. IMS)
+and occupancy control (limited vs. unlimited). It is not necessary to
+map all portals to the guest.
+
+Control-path MMIO registers of DSA WQ include global configurations
+(shared by all WQs) and WQ-specific configurations. The owner
+(e.g. the guest) of the WQ is expected to only change WQ-specific
+configurations. Intel DSA spec introduces a “Configuration Support”
+capability which, if cleared, indicates that some fields of WQ
+configuration registers are read-only and the WQ configuration is
+pre-configured by the host.
+
+
+Interrupt Message Store (IMS)
+-----------------------------
+The ADI utilizes Interrupt Message Store (IMS), a device-specific MSI
+implementation, instead of MSIX for interrupts for the guest. This
+preserves MSIX for host usages and also allows a significantly larger
+number of interrupt vectors for large number of guests usage.
+
+Intel DSA device implements IMS as on-device memory mapped unified
+storage. Each interrupt message is stored as a DWORD size data payload
+and a 64-bit address (same as MSI-X). Access to the IMS is through the
+host idxd driver.
+
+The idxd driver makes use of the generic IMS irq chip and domain which
+stores the interrupt messages in an array in device memory. Allocation and
+freeing of interrupts happens via the generic msi_domain_alloc/free_irqs()
+interface. Driver only needs to ensure the interrupt domain is stored in
+the underlying device struct.
+
+
+ADI Isolation
+-------------
+Operations or functioning of one ADI must not affect the functioning
+of another ADI or the physical device. Upstream memory requests from
+different ADIs are distinguished using a Process Address Space Identifier
+(PASID). With the support of PASID-granular address translation in Intel
+VT-d, the address space targeted by a request from ADI can be a Host
+Virtual Address (HVA), Host I/O Virtual Address (HIOVA), Guest Physical
+Address (GPA), Guest Virtual Address (GVA), Guest I/O Virtual Address
+(GIOVA), etc. The PASID identity for an ADI is expected to be accessed
+or modified by privileged software through the host driver.
+
+=========================
+Virtual DSA (vDSA) Device
+=========================
+The DSA WQ itself is not a PCI device thus must be composed into a
+virtual DSA device to the guest.
+
+The composition logic needs to handle four main requirements:
+* Emulate PCI config space.
+* Map data-path portals for direct access from the guest.
+* Emulate control-path MMIO registers and selectively forward WQ
+ configuration requests through host driver to the device.
+* Forward and emulate WQ interrupts to the guest.
+
+The composition logic tells the guest aspects of WQ which are configurable
+through a combination of capability fields, e.g.:
+* Configuration Support (if cleared, most aspects are not modifiable).
+* WQ Mode Support (if cleared, cannot change between dedicated and
+ shared mode).
+* Dedicated Mode Support.
+* Shared Mode Support.
+* ...
+
+The virtual capability fields are set according to the vDSA
+type. Following is an example of vDSA types and related WQ configurability:
+* Type ‘1dwq-v1’
+ * One DSA gen1 dedicated WQ to this guest
+ * Guest cannot share the WQ between its clients (no guest SVA)
+ * Guest cannot change any WQ configuration
+
+Besides, the composition logic also needs to serve administrative commands
+(thru virtual CMD register) through host driver, including:
+* Drain/abort all descriptors submitted by this guest.
+* Drain/abort descriptors associated with a PASID.
+* Enable/disable/reset the WQ (when it’s not shared by multiple VMs).
+* Request interrupt handle.
+
+With this design, vDSA emulation is **greatly simplified**. Most
+registers are emulated in simple READ-ONLY flavor, and handling limited
+configurability is required only for a few registers.
+
+===========================
+VFIO mdev vs. userspace DMA
+===========================
+There are two avenues to support vDSA composition.
+1. VFIO mediated device (mdev)
+2. Userspace DMA through char device
+
+VFIO mdev provides a generic subdevice passthrough framework. Unified
+uAPIs are used for both device and subdevice passthrough, thus any
+userspace VMM which already supports VFIO device passthrough would
+naturally support mdev/subdevice passthrough. The implication of VFIO
+mdev is putting emulation of device interface in the kernel (part of
+host driver) which must be carefully scrutinized. Fortunately, vDSA
+composition includes only a small portion of emulation code, due to the
+fact that most registers are simply READ-ONLY to the guest. The majority
+logic of handling limited configurability and administrative commands
+is anyway required to sit in the kernel, regardless of which kernel uAPI
+is pursued. In this regard, VFIO mdev is a nice fit for vDSA composition.
+
+IDXD driver provides a char device interface for applications to
+map the WQ portal and directly submit descriptors to do DMA. This
+interface provides only data-path access to userspace and relies on
+the host driver to handle control-path configurations. Expanding such
+interface to support subdevice passthrough allows moving the emulation
+code to userspace. However, quite some work is required to grow it from
+an application-oriented interface into a passthrough-oriented interface:
+new uAPIs to handle guest WQ configurability and administrative commands,
+and new uAPIs to handle passthrough specific requirements (e.g. DMA map,
+guest SVA, live migration, posted interrupt, etc.). And once it is done,
+every userspace VMM has to explicitly bind to IDXD specific uAPI, even
+though the real user is in the guest (instead of the VMM itself) in the
+passthrough scenario.
+
+Although some generalization might be possible to reduce the work of
+handling passthrough, we feel the difference between userspace DMA
+and subdevice passthrough is distinct in IDXD. Therefore, we choose to
+build vDSA composition on top of VFIO mdev framework and leave userspace
+DMA intact after discussion at LPC 2020.
+
+=============================
+Host Registration and Release
+=============================
+
+Intel DSA reports support for Intel Scalable IOV via a PCI Express
+Designated Vendor Specific Extended Capability (DVSEC). In addition,
+PASID-granular address translation capability is required in the
+IOMMU. During host initialization, the IDXD driver should check the
+presence of both capabilities before calling mdev_register_device()
+to register with the VFIO mdev framework and provide a set of ops
+(struct mdev_parent_ops). The IOMMU capability is indicated by the
+IOMMU_DEV_FEAT_AUX feature flag with iommu_dev_has_feature() and enabled
+with iommu_dev_enable_feature().
+
+On release, iommu_dev_disable_feature() is called after
+mdev_unregister_device() to disable the IOMMU_DEV_FEAT_AUX flag that
+the driver enabled during host initialization.
+
+The mdev_parent_ops data structure is filled out by the driver to provide
+a number of ops called by VFIO mdev framework::
+
+ struct mdev_parent_ops {
+ .supported_type_groups
+ .create
+ .remove
+ .open
+ .release
+ .read
+ .write
+ .mmap
+ .ioctl
+ };
+
+Supported_type_groups
+---------------------
+At the moment only one vDSA type is supported.
+
+“1dwq-v1”:
+ Single dedicated WQ (DSA 1.0) with read-only configuration exposed to
+ the guest. On the guest kernel, a vDSA device shows up with a single
+ WQ that is pre-configured by the host. The configuration for the WQ
+ is entirely read-only and cannot be reconfigured. There is no support
+ of guest SVA on this WQ.
+
+ The interrupt vector 0 is emulated by the host driver to support the admin
+ command completion and error reporting. A second interrupt vector is
+ bound to the IMS and used for I/O operation. In this implementation,
+ there are only two vectors being supported.
+
+create
+------
+API function to create the mdev. mdev_set_iommu_device() is called to
+associate the mdev device to the parent PCI device. This function is
+where the driver sets up and initializes the resources to support a single
+mdev device. This is triggered through sysfs to initiate the creation.
+
+remove
+------
+API function that mirrors the create() function and releases all the
+resources backing the mdev. This is also triggered through sysfs.
+
+open
+----
+API function that is called down from VFIO userspace to indicate to the
+driver that the upper layers are ready to claim and utilize the mdev. IMS
+entries are allocated and setup here.
+
+release
+-------
+The mirror function to open that releases the mdev by VFIO userspace.
+
+read / write
+------------
+This is where the Intel IDXD driver provides read/write emulation of
+PCI config space and MMIO registers. These paths are the “slow” path
+of the mediated device and emulation is used rather than direct access
+to the hardware resources. Typically configuration and administrative
+commands go through this path. This allows the mdev to show up as a
+virtual PCI device on the guest kernel.
+
+The emulation of PCI config space is nothing special, which is simply
+copied from kvmgt. In the future this part might be consolidated to
+reduce duplication.
+
+Emulating MMIO reads are simply memory copies. There is no side-effect
+to be emulated upon guest read.
+
+Emulating MMIO writes are required only for a few registers, due to
+read-only configuration on the ‘1dwq-v1’ type. Majority of composition
+logic is hooked in the CMD register for performing administrative commands
+such as WQ drain, abort, enable, disable and reset operations. The rest of
+the emulation is about handling errors (GENCTRL/SWERROR) and interrupts
+(INTCAUSE/MSIXPERM) on the vDSA device. Future mdev types might allow
+limited WQ configurability, which then requires additional emulation of
+the WQCFG register.
+
+mmap
+----
+This is the function that provides the setup to expose a portion of the
+hardware, also known as portals, for direct access for “fast” path
+operations through the mmap() syscall. A limited region of the hardware
+is mapped to the guest for direct I/O submission.
+
+There are four portals per WQ: unlimited MSI-X, limited MSI-X, unlimited
+IMS, limited IMS. Descriptors submitted to limited portals are subject
+to threshold configuration limitations for shared WQs. The MSI-X portals
+are used for host submissions, and the IMS portals are mapped to vm for
+guest submission.
+
+ioctl
+-----
+This API function does several things
+* Provides general device information to VFIO userspace.
+* Provides device region information (PCI, mmio, etc).
+* Get interrupts information
+* Setup interrupts for the mediated device.
+* Mdev device reset
+
+For the Intel idxd driver, Interrupt Message Store (IMS) vectors are being
+used for mdev interrupts rather than MSIX vectors. IMS provides additional
+interrupt vectors outside of PCI MSIX specification in order to support
+significantly more vectors. The emulated interrupt (0) is connected through
+kernel eventfd. When interrupt 0 needs to be asserted, the driver will
+signal the eventfd to trigger vector 0 interrupt on the guest.
+The IMS interrupts are setup via eventfd as well. However, it utilizes
+irq bypass manager to directly inject the interrupt in the guest.
+
+To allocate IMS, we utilize the IMS array APIs. On host init, we need
+to create the MSI domain::
+
+ struct ims_array_info ims_info;
+ struct device *dev = &pci_dev->dev;
+
+
+ /* assign the device IMS size */
+ ims_info.max_slots = max_ims_size;
+ /* assign the MMIO base address for the IMS table */
+ ims_info.slots = mmio_base + ims_offset;
+ /* assign the MSI domain to the device */
+ dev->msi_domain = pci_ims_array_create_msi_irq_domain(pci_dev, &ims_info);
+
+When we are ready to allocate the interrupts::
+
+ struct device *dev = mdev_dev(mdev);
+
+ irq_domain = pci_dev->dev.msi_domain;
+ /* the irqs are allocated against device of mdev */
+ rc = msi_domain_alloc_irqs(irq_domain, dev, num_vecs);
+
+
+ /* we can retrieve the slot index from msi_entry */
+ for_each_msi_entry(entry, dev) {
+ slot_index = entry->device_msi.hwirq;
+ irq = entry->irq;
+ }
+
+ request_irq(irq, interrupt_handler_function, 0, “ims”, context);
+
+
+The DSA device is structured such that MSI-X table entry 0 is used for
+admin commands completion, error reporting, and other misc commands. The
+remaining MSI-X table entries are used for WQ completion. For vm support,
+the virtual device also presents a similar layout. Therefore, vector 0
+is emulated by the software. Additional vector(s) are associated with IMS.
+
+The index (slot) for the per device IMS entry is managed by the MSI
+core. The index is the “interrupt handle” that the guest kernel
+needs to program into a DMA descriptor. That interrupt handle tells the
+hardware which IMS vector to trigger the interrupt on for the host.
+
+The virtual device presents an admin command called “request interrupt
+handle” that is not supported by the physical device. On probe of
+the DSA device on the guest kernel, the guest driver will issue the
+“request interrupt handle” command in order to get the interrupt
+handle for descriptor programming. The host driver will return the
+assigned slot for the IMS entry table to the guest.
+
+==========
+References
+==========
+[1] https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
+[2] https://software.intel.com/en-us/articles/intel-sdm
+[3] https://software.intel.com/sites/default/files/managed/cc/0e/intel-scalable-io-virtualization-technical-specification.pdf
+[4] https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
diff --git a/MAINTAINERS b/MAINTAINERS
index c2114daa6bc7..ae34b0331eb4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8970,6 +8970,7 @@ INTEL IADX DRIVER
M: Dave Jiang <[email protected]>
L: [email protected]
S: Supported
+F: Documentation/driver-api/vfio/mdev-idxd.rst
F: drivers/dma/idxd/*
F: include/uapi/linux/idxd.h
In preparation for support of VFIO mediated device for idxd driver, the
enabling for Interrupt Message Store (IMS) interrupts is added for the idxd
With IMS support the idxd driver can dynamically allocate interrupts on a
per mdev basis based on how many IMS vectors that are mapped to the mdev
device. This commit only provides the detection functions in the base driver
and not the VFIO mdev code utilization.
The commit has some portal related changes. A "portal" is a special
location within the MMIO BAR2 of the DSA device where descriptors are
submitted via the CPU command MOVDIR64B or ENQCMD(S). The offset for the
portal address determines whether the submitted descriptor is for MSI-X
or IMS notification.
See Intel SIOV spec for more details:
https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
Signed-off-by: Dave Jiang <[email protected]>
---
Documentation/ABI/stable/sysfs-driver-dma-idxd | 6 ++++++
drivers/dma/idxd/cdev.c | 4 ++--
drivers/dma/idxd/device.c | 2 +-
drivers/dma/idxd/idxd.h | 13 +++++++++----
drivers/dma/idxd/init.c | 19 +++++++++++++++++++
drivers/dma/idxd/registers.h | 7 +++++++
drivers/dma/idxd/sysfs.c | 9 +++++++++
7 files changed, 53 insertions(+), 7 deletions(-)
diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd
index 55285c136cf0..95cd7975f488 100644
--- a/Documentation/ABI/stable/sysfs-driver-dma-idxd
+++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd
@@ -129,6 +129,12 @@ KernelVersion: 5.10.0
Contact: [email protected]
Description: The last executed device administrative command's status/error.
+What: /sys/bus/dsa/devices/dsa<m>/ims_size
+Date: Oct 15, 2020
+KernelVersion: 5.11.0
+Contact: [email protected]
+Description: The total number of vectors available for Interrupt Message Store.
+
What: /sys/bus/dsa/devices/wq<m>.<n>/block_on_fault
Date: Oct 27, 2020
KernelVersion: 5.11.0
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 0db9b82ed8cf..b1518106434f 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -205,8 +205,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma)
return rc;
vma->vm_flags |= VM_DONTCOPY;
- pfn = (base + idxd_get_wq_portal_full_offset(wq->id,
- IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT;
+ pfn = (base + idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED,
+ IDXD_IRQ_MSIX)) >> PAGE_SHIFT;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_private_data = ctx;
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 205156afeb54..d6c447d09a6f 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -290,7 +290,7 @@ int idxd_wq_map_portal(struct idxd_wq *wq)
resource_size_t start;
start = pci_resource_start(pdev, IDXD_WQ_BAR);
- start += idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED);
+ start += idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED, IDXD_IRQ_MSIX);
wq->portal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE);
if (!wq->portal)
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a9386a66ab72..90c9458903e1 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -163,6 +163,7 @@ enum idxd_device_flag {
IDXD_FLAG_CONFIGURABLE = 0,
IDXD_FLAG_CMD_RUNNING,
IDXD_FLAG_PASID_ENABLED,
+ IDXD_FLAG_IMS_SUPPORTED,
};
struct idxd_device {
@@ -190,6 +191,7 @@ struct idxd_device {
int num_groups;
+ u32 ims_offset;
u32 msix_perm_offset;
u32 wqcfg_offset;
u32 grpcfg_offset;
@@ -197,6 +199,7 @@ struct idxd_device {
u64 max_xfer_bytes;
u32 max_batch_size;
+ int ims_size;
int max_groups;
int max_engines;
int max_tokens;
@@ -279,15 +282,17 @@ enum idxd_interrupt_type {
IDXD_IRQ_IMS,
};
-static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot)
+static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
+ enum idxd_interrupt_type irq_type)
{
- return prot * 0x1000;
+ return prot * 0x1000 + irq_type * 0x2000;
}
static inline int idxd_get_wq_portal_full_offset(int wq_id,
- enum idxd_portal_prot prot)
+ enum idxd_portal_prot prot,
+ enum idxd_interrupt_type irq_type)
{
- return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot);
+ return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot, irq_type);
}
static inline void idxd_set_type(struct idxd_device *idxd)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 0c982337ef84..ee56b92108d8 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -254,10 +254,28 @@ static void idxd_read_table_offsets(struct idxd_device *idxd)
dev_dbg(dev, "IDXD Work Queue Config Offset: %#x\n", idxd->wqcfg_offset);
idxd->msix_perm_offset = offsets.msix_perm * IDXD_TABLE_MULT;
dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset);
+ idxd->ims_offset = offsets.ims * IDXD_TABLE_MULT;
+ dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset);
idxd->perfmon_offset = offsets.perfmon * IDXD_TABLE_MULT;
dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset);
}
+static void idxd_check_ims(struct idxd_device *idxd)
+{
+ struct pci_dev *pdev = idxd->pdev;
+
+ /* verify that we have IMS vectors supported by device */
+ if (idxd->hw.gen_cap.max_ims_mult) {
+ idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL;
+ dev_dbg(&pdev->dev, "IMS size: %u\n", idxd->ims_size);
+ set_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags);
+ dev_dbg(&pdev->dev, "IMS supported for device\n");
+ return;
+ }
+
+ dev_dbg(&pdev->dev, "IMS unsupported for device\n");
+}
+
static void idxd_read_caps(struct idxd_device *idxd)
{
struct device *dev = &idxd->pdev->dev;
@@ -276,6 +294,7 @@ static void idxd_read_caps(struct idxd_device *idxd)
dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes);
idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift;
dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size);
+ idxd_check_ims(idxd);
if (idxd->hw.gen_cap.config_en)
set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags);
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index 5cbf368c7367..c97f700bcf34 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -385,4 +385,11 @@ union wqcfg {
#define GRPENGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 32)
#define GRPFLGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 40)
+#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */
+#define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */
+#define PCI_DVSEC_HEADER2 0x8 /* Designated Vendor-Specific Header2 */
+#define PCI_DVSEC_ID_INTEL_SIOV 0x0005
+#define PCI_DVSEC_INTEL_SIOV_CAP 0x0014
+#define PCI_DVSEC_INTEL_SIOV_CAP_IMS 0X00000001
+
#endif
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 21c1e23cdf23..ab5c76e1226b 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -1444,6 +1444,14 @@ static ssize_t numa_node_show(struct device *dev,
}
static DEVICE_ATTR_RO(numa_node);
+static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev);
+
+ return sprintf(buf, "%u\n", idxd->ims_size);
+}
+static DEVICE_ATTR_RO(ims_size);
+
static ssize_t max_batch_size_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -1639,6 +1647,7 @@ static struct attribute *idxd_device_attributes[] = {
&dev_attr_max_work_queues_size.attr,
&dev_attr_max_engines.attr,
&dev_attr_numa_node.attr,
+ &dev_attr_ims_size.attr,
&dev_attr_max_batch_size.attr,
&dev_attr_max_transfer_size.attr,
&dev_attr_op_cap.attr,
On Fri, Feb 05, 2021 at 01:53:05PM -0700, Dave Jiang wrote:
> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
> index 21c1e23cdf23..ab5c76e1226b 100644
> +++ b/drivers/dma/idxd/sysfs.c
> @@ -1444,6 +1444,14 @@ static ssize_t numa_node_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(numa_node);
>
> +static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> + struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev);
> +
> + return sprintf(buf, "%u\n", idxd->ims_size);
> +}
use sysfs_emit for all new sysfs functions please
Jason
On 2/10/2021 4:30 PM, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 01:53:05PM -0700, Dave Jiang wrote:
>
>> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
>> index 21c1e23cdf23..ab5c76e1226b 100644
>> +++ b/drivers/dma/idxd/sysfs.c
>> @@ -1444,6 +1444,14 @@ static ssize_t numa_node_show(struct device *dev,
>> }
>> static DEVICE_ATTR_RO(numa_node);
>>
>> +static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf)
>> +{
>> + struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev);
>> +
>> + return sprintf(buf, "%u\n", idxd->ims_size);
>> +}
> use sysfs_emit for all new sysfs functions please
Will fix. Thanks!
>
> Jason
On Fri, Feb 05, 2021 at 01:53:18PM -0700, Dave Jiang wrote:
> diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
> index a2438b3166db..f02c96164515 100644
> +++ b/drivers/dma/idxd/idxd.h
> @@ -8,6 +8,7 @@
> #include <linux/percpu-rwsem.h>
> #include <linux/wait.h>
> #include <linux/cdev.h>
> +#include <linux/auxiliary_bus.h>
> #include "registers.h"
>
> #define IDXD_DRIVER_VERSION "1.00"
> @@ -221,6 +222,8 @@ struct idxd_device {
> struct work_struct work;
>
> int *int_handles;
> +
> + struct auxiliary_device *mdev_auxdev;
> };
If there is only one aux device there not much reason to make it a
dedicated allocation.
> /* IDXD software descriptor */
> @@ -282,6 +285,10 @@ enum idxd_interrupt_type {
> IDXD_IRQ_IMS,
> };
>
> +struct idxd_mdev_aux_drv {
> + struct auxiliary_driver auxiliary_drv;
> +};
Wrong indent. What is this even for?
> +
> static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
> enum idxd_interrupt_type irq_type)
> {
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index ee56b92108d8..fd57f39e4b7d 100644
> +++ b/drivers/dma/idxd/init.c
> @@ -382,6 +382,74 @@ static void idxd_disable_system_pasid(struct idxd_device *idxd)
> idxd->sva = NULL;
> }
>
> +static void idxd_remove_mdev_auxdev(struct idxd_device *idxd)
> +{
> + if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
> + return;
> +
> + auxiliary_device_delete(idxd->mdev_auxdev);
> + auxiliary_device_uninit(idxd->mdev_auxdev);
> +}
> +
> +static void idxd_auxdev_release(struct device *dev)
> +{
> + struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
> + struct idxd_device *idxd = dev_get_drvdata(dev);
Nope, where did you see drvdata being used like this? You need to use
container_of.
If put the mdev_auxdev as a non pointer member then this is just:
struct idxd_device *idxd = container_of(dev, struct idxd_device, mdev_auxdev)
put_device(&idxd->conf_dev);
And fix the 'setup' to match this design
> + kfree(auxdev->name);
This is weird, the name shouldn't be allocated, it is supposed to be a
fixed string to make it easy to find the driver name in the code base.
> +static int idxd_setup_mdev_auxdev(struct idxd_device *idxd)
> +{
> + struct auxiliary_device *auxdev;
> + struct device *dev = &idxd->pdev->dev;
> + int rc;
> +
> + if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
> + return 0;
> +
> + auxdev = kzalloc(sizeof(*auxdev), GFP_KERNEL);
> + if (!auxdev)
> + return -ENOMEM;
> +
> + auxdev->name = kasprintf(GFP_KERNEL, "mdev-%s", idxd_name[idxd->type]);
> + if (!auxdev->name) {
> + rc = -ENOMEM;
> + goto err_name;
> + }
> +
> + dev_dbg(&idxd->pdev->dev, "aux dev mdev: %s\n", auxdev->name);
> +
> + auxdev->dev.parent = dev;
> + auxdev->dev.release = idxd_auxdev_release;
> + auxdev->id = idxd->id;
> +
> + rc = auxiliary_device_init(auxdev);
> + if (rc < 0) {
> + dev_err(dev, "Failed to init aux dev: %d\n", rc);
> + goto err_auxdev;
> + }
Put the init earlier so it can handle the error unwinds
> + rc = auxiliary_device_add(auxdev);
> + if (rc < 0) {
> + dev_err(dev, "Failed to add aux dev: %d\n", rc);
> + goto err_auxdev;
> + }
> +
> + idxd->mdev_auxdev = auxdev;
> + dev_set_drvdata(&auxdev->dev, idxd);
No to using drvdata, and this is in the wrong order anyhow.
> + return 0;
> +
> + err_auxdev:
> + kfree(auxdev->name);
> + err_name:
> + kfree(auxdev);
> + return rc;
> +}
> +
> static int idxd_probe(struct idxd_device *idxd)
> {
> struct pci_dev *pdev = idxd->pdev;
> @@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
> goto err_idr_fail;
> }
>
> + rc = idxd_setup_mdev_auxdev(idxd);
> + if (rc < 0)
> + goto err_auxdev_fail;
> +
> idxd->major = idxd_cdev_get_major(idxd);
>
> dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
> return 0;
>
> + err_auxdev_fail:
> + mutex_lock(&idxd_idr_lock);
> + idr_remove(&idxd_idrs[idxd->type], idxd->id);
> + mutex_unlock(&idxd_idr_lock);
Probably wrong to order things like this..
Also somehow this has a
idxd = devm_kzalloc(dev, sizeof(struct idxd_device), GFP_KERNEL);
but the idxd has a kref'd struct device in it:
struct idxd_device {
enum idxd_type type;
struct device conf_dev;
So that's not right either
You'll need to fix the lifetime model for idxd_device before you get
to adding auxdevices
> +static int idxd_mdev_host_init(struct idxd_device *idxd)
> +{
> + /* FIXME: Fill in later */
> + return 0;
> +}
> +
> +static int idxd_mdev_host_release(struct idxd_device *idxd)
> +{
> + /* FIXME: Fill in later */
> + return 0;
> +}
Don't leave empty stubs like this, just provide the whole driver in
the next patch
> +static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
> + const struct auxiliary_device_id *id)
> +{
> + struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
Continuing no to using drvdata, must use container_of
> + int rc;
> +
> + rc = idxd_mdev_host_init(idxd);
And why add this indirection? Just write what it here
> +static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
> + .auxiliary_drv = {
> + .id_table = idxd_mdev_auxbus_id_table,
> + .probe = idxd_mdev_aux_probe,
> + .remove = idxd_mdev_aux_remove,
> + },
> +};
Why idxd_mdev_aux_drv ? Does a later patch add something here?
> +static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
> +{
> + return auxiliary_driver_register(&drv->auxiliary_drv);
> +}
> +
> +static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv)
> +{
> + auxiliary_driver_unregister(&drv->auxiliary_drv);
> +}
> +
> +module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister);
There is some auxillary driver macro that does this boilerplate
Jason
On Fri, Feb 05, 2021 at 01:53:37PM -0700, Dave Jiang wrote:
> -static const struct mdev_parent_ops idxd_vdcm_ops = {
> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
> +{
> + struct vdcm_idxd_type *type;
> +
> + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj));
> +
> + if (type)
> + return sprintf(buf, "%s\n", type->name);
> +
> + return -EINVAL;
Success oriented flow
Jason
On 2/10/2021 4:46 PM, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 01:53:18PM -0700, Dave Jiang wrote:
>> diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
>> index a2438b3166db..f02c96164515 100644
>> +++ b/drivers/dma/idxd/idxd.h
>> @@ -8,6 +8,7 @@
>> #include <linux/percpu-rwsem.h>
>> #include <linux/wait.h>
>> #include <linux/cdev.h>
>> +#include <linux/auxiliary_bus.h>
>> #include "registers.h"
>>
>> #define IDXD_DRIVER_VERSION "1.00"
>> @@ -221,6 +222,8 @@ struct idxd_device {
>> struct work_struct work;
>>
>> int *int_handles;
>> +
>> + struct auxiliary_device *mdev_auxdev;
>> };
> If there is only one aux device there not much reason to make it a
> dedicated allocation.
Hi Jason. Thank you for the review. Very much appreciated!
Yep. I had it embedded and then changed it when I was working on the
UACCE bits to make it uniform. Should've just kept it the way it was.
>> /* IDXD software descriptor */
>> @@ -282,6 +285,10 @@ enum idxd_interrupt_type {
>> IDXD_IRQ_IMS,
>> };
>>
>> +struct idxd_mdev_aux_drv {
>> + struct auxiliary_driver auxiliary_drv;
>> +};
> Wrong indent. What is this even for?
Will remove.
>> +
>> static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
>> enum idxd_interrupt_type irq_type)
>> {
>> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
>> index ee56b92108d8..fd57f39e4b7d 100644
>> +++ b/drivers/dma/idxd/init.c
>> @@ -382,6 +382,74 @@ static void idxd_disable_system_pasid(struct idxd_device *idxd)
>> idxd->sva = NULL;
>> }
>>
>> +static void idxd_remove_mdev_auxdev(struct idxd_device *idxd)
>> +{
>> + if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
>> + return;
>> +
>> + auxiliary_device_delete(idxd->mdev_auxdev);
>> + auxiliary_device_uninit(idxd->mdev_auxdev);
>> +}
>> +
>> +static void idxd_auxdev_release(struct device *dev)
>> +{
>> + struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
>> + struct idxd_device *idxd = dev_get_drvdata(dev);
> Nope, where did you see drvdata being used like this? You need to use
> container_of.
>
> If put the mdev_auxdev as a non pointer member then this is just:
>
> struct idxd_device *idxd = container_of(dev, struct idxd_device, mdev_auxdev)
>
> put_device(&idxd->conf_dev);
>
> And fix the 'setup' to match this design
Yes. Once it's embedded, everything falls in place. The drvdata hack was
to deal with the auxdev being a pointer.
>> + kfree(auxdev->name);
> This is weird, the name shouldn't be allocated, it is supposed to be a
> fixed string to make it easy to find the driver name in the code base.
Will fix.
>> +static int idxd_setup_mdev_auxdev(struct idxd_device *idxd)
>> +{
>> + struct auxiliary_device *auxdev;
>> + struct device *dev = &idxd->pdev->dev;
>> + int rc;
>> +
>> + if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
>> + return 0;
>> +
>> + auxdev = kzalloc(sizeof(*auxdev), GFP_KERNEL);
>> + if (!auxdev)
>> + return -ENOMEM;
>> +
>> + auxdev->name = kasprintf(GFP_KERNEL, "mdev-%s", idxd_name[idxd->type]);
>> + if (!auxdev->name) {
>> + rc = -ENOMEM;
>> + goto err_name;
>> + }
>> +
>> + dev_dbg(&idxd->pdev->dev, "aux dev mdev: %s\n", auxdev->name);
>> +
>> + auxdev->dev.parent = dev;
>> + auxdev->dev.release = idxd_auxdev_release;
>> + auxdev->id = idxd->id;
>> +
>> + rc = auxiliary_device_init(auxdev);
>> + if (rc < 0) {
>> + dev_err(dev, "Failed to init aux dev: %d\n", rc);
>> + goto err_auxdev;
>> + }
> Put the init earlier so it can handle the error unwinds
I think with auxdev embedded, there's really not much going on so I
think this resolves itself.
>> + rc = auxiliary_device_add(auxdev);
>> + if (rc < 0) {
>> + dev_err(dev, "Failed to add aux dev: %d\n", rc);
>> + goto err_auxdev;
>> + }
>> +
>> + idxd->mdev_auxdev = auxdev;
>> + dev_set_drvdata(&auxdev->dev, idxd);
> No to using drvdata, and this is in the wrong order anyhow.
>
>> + return 0;
>> +
>> + err_auxdev:
>> + kfree(auxdev->name);
>> + err_name:
>> + kfree(auxdev);
>> + return rc;
>> +}
>> +
>> static int idxd_probe(struct idxd_device *idxd)
>> {
>> struct pci_dev *pdev = idxd->pdev;
>> @@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
>> goto err_idr_fail;
>> }
>>
>> + rc = idxd_setup_mdev_auxdev(idxd);
>> + if (rc < 0)
>> + goto err_auxdev_fail;
>> +
>> idxd->major = idxd_cdev_get_major(idxd);
>>
>> dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
>> return 0;
>>
>> + err_auxdev_fail:
>> + mutex_lock(&idxd_idr_lock);
>> + idr_remove(&idxd_idrs[idxd->type], idxd->id);
>> + mutex_unlock(&idxd_idr_lock);
> Probably wrong to order things like this..
How should it be ordered?
>
> Also somehow this has a
>
> idxd = devm_kzalloc(dev, sizeof(struct idxd_device), GFP_KERNEL);
>
> but the idxd has a kref'd struct device in it:
So the conf_dev is a struct device that let the driver do configuration
of the device and other components through sysfs. It's a child device to
the pdev. It should have no relation to the auxdev. The confdevs for
each component should not be released until the physical device is
released. For the mdev case, the auxdev shouldn't be released until the
removal of the pdev as well since it is a child of the pdev also.
pdev --- device conf_dev --- wq conf_dev
| |--- engine conf_dev
| |--- group conf_dev
|--- aux_dev
>
> struct idxd_device {
> enum idxd_type type;
> struct device conf_dev;
>
> So that's not right either
>
> You'll need to fix the lifetime model for idxd_device before you get
> to adding auxdevices
Can you kindly expand on how it's suppose to look like please?
>
>> +static int idxd_mdev_host_init(struct idxd_device *idxd)
>> +{
>> + /* FIXME: Fill in later */
>> + return 0;
>> +}
>> +
>> +static int idxd_mdev_host_release(struct idxd_device *idxd)
>> +{
>> + /* FIXME: Fill in later */
>> + return 0;
>> +}
> Don't leave empty stubs like this, just provide the whole driver in
> the next patch
Ok will do that.
>
>> +static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
>> + const struct auxiliary_device_id *id)
>> +{
>> + struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
> Continuing no to using drvdata, must use container_of
>
>> + int rc;
>> +
>> + rc = idxd_mdev_host_init(idxd);
> And why add this indirection? Just write what it here
ok
>
>> +static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
>> + .auxiliary_drv = {
>> + .id_table = idxd_mdev_auxbus_id_table,
>> + .probe = idxd_mdev_aux_probe,
>> + .remove = idxd_mdev_aux_remove,
>> + },
>> +};
> Why idxd_mdev_aux_drv ? Does a later patch add something here?
Yes. There is a callback function that's added later. But I can code it
so that it gets changed later on.
>
>> +static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
>> +{
>> + return auxiliary_driver_register(&drv->auxiliary_drv);
>> +}
>> +
>> +static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv)
>> +{
>> + auxiliary_driver_unregister(&drv->auxiliary_drv);
>> +}
>> +
>> +module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister);
> There is some auxillary driver macro that does this boilerplate
Ok thanks.
>
> Jason
On Fri, Feb 12, 2021 at 11:56:24AM -0700, Dave Jiang wrote:
> > > @@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
> > > goto err_idr_fail;
> > > }
> > > + rc = idxd_setup_mdev_auxdev(idxd);
> > > + if (rc < 0)
> > > + goto err_auxdev_fail;
> > > +
> > > idxd->major = idxd_cdev_get_major(idxd);
> > > dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
> > > return 0;
> > > + err_auxdev_fail:
> > > + mutex_lock(&idxd_idr_lock);
> > > + idr_remove(&idxd_idrs[idxd->type], idxd->id);
> > > + mutex_unlock(&idxd_idr_lock);
> > Probably wrong to order things like this..
>
> How should it be ordered?
The IDR is global data so some other thread could have read the IDR
and got this pointer, but now it is being torn down in some racy
way. It is best to make the store to global access be the very last
thing so you never have to try to unstore from global memory and don't
have to think about concurrency.
> > Also somehow this has a
> >
> > idxd = devm_kzalloc(dev, sizeof(struct idxd_device), GFP_KERNEL);
> >
> > but the idxd has a kref'd struct device in it:
>
> So the conf_dev is a struct device that let the driver do configuration of
> the device and other components through sysfs. It's a child device to the
> pdev. It should have no relation to the auxdev. The confdevs for each
> component should not be released until the physical device is released. For
> the mdev case, the auxdev shouldn't be released until the removal of the
> pdev as well since it is a child of the pdev also.
>
> pdev --- device conf_dev --- wq conf_dev
>
> | |--- engine conf_dev
>
> | |--- group conf_dev
>
> |--- aux_dev
>
> >
> > struct idxd_device {
> > enum idxd_type type;
> > struct device conf_dev;
> >
> > So that's not right either
> >
> > You'll need to fix the lifetime model for idxd_device before you get
> > to adding auxdevices
>
> Can you kindly expand on how it's suppose to look like please?
Well, you can't call kfree on memory that contains a struct device,
you have to use put_device() - so the devm_kzalloc is unconditionally
wrong. Maybe you could replace it with a devm put device action, but
it would probably be alot saner to just put the required put_device's
where they need to be in the first place.
I didn't try to work out what this was all for, but once it is sorted
out you can just embed the aux device here and chain its release to
put_device on the conf_dev and all the lifetime will work out
naturally.
Jason