2023-10-27 17:03:35

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

Changes since RFC V2:
- RFC V2: https://lore.kernel.org/lkml/[email protected]/
- Still submiting this as RFC series. I believe that this now matches the
expectatations raised during earlier reviews. If you agree this is
the right direction then I can drop the RFC prefix on next submission.
If you do not agree then please do let me know where I missed
expectations.
- First patch (PCI/MSI: Provide stubs for IMS functions)
has been submitted upstream separately and is queued for inclusion during
the next merge window. I do still include it in this series to avoid
the noise about issues that bots will find when checking this series
without it included.
https://lore.kernel.org/lkml/169757242009.3135.5502383859327174030.tip-bot2@tip-bot2/
- Eliminated duplicate code between the PCI passthrough device backend and
the IMS backend through more abstraction within the interrupt management
frontend. (Kevin)
- Emulated interrupts are now managed by the interrupt management
frontend and no longer unique to IMS. (Jason and Kevin)
- Since being an emulated interrupt is a persistent property there is
a new functional change to PCI interrupt management in that per-interrupt
contexts (managed by frontend) are now persistent (they remain allocated
until device release).
- Most of the patches from RFC V2 look the same with more patches
added to support the additional abstraction needed to eliminate the
duplicate code. The IMS support was refactored to benefit from the
new abstraction. Please refer to individual patches for specific changes.

Changes since RFC V1:
- RFC V1: https://lore.kernel.org/lkml/[email protected]/
- This is a complete rewrite based on feedback from Jason and Kevin.
Primarily the transition is to make IMS a new backend of MSI-X
emulation: VFIO PCI transitions to be an interrupt management frontend
with existing interrupt management for PCI passthrough devices as a
backend and IMS interrupt management introduced as a new backend.
The first part of the series splits VFIO PCI interrupt
management into a "frontend" and "backend" with the existing PCI
interrupt management as its first backend. The second part of the
series adds IMS interrupt management as a new interrupt management
backend.
This is a significant change from RFC V1 as well as in the impact of
the changes on existing VFIO PCI. This was done in response to
feedback that I hope I understood as intended. If I did not get it
right, please do point out to me where I went astray and I'd be
happy to rewrite. Of course, suggestions for improvement will
be much appreciated.

Hi Everybody,

With Interrupt Message Store (IMS) support introduced in
commit 0194425af0c8 ("PCI/MSI: Provide IMS (Interrupt Message Store)
support") a device can create a secondary interrupt domain that works
side by side with MSI-X on the same device. IMS allows for
implementation-specific interrupt storage that is managed by the
implementation specific interrupt chip associated with the IMS domain
at the time it (the IMS domain) is created for the device via
pci_create_ims_domain().

An example usage of IMS is for devices that can have their resources
assigned to guests with varying granularity. For example, an
accelerator device may support many workqueues and a single workqueue
can be composed into a virtual device for use by a guest. Using
IMS interrupts for the guest preserves MSI-X for host usage while
allowing a significantly larger number of interrupt vectors than
allowed by MSI-X. All while enabling usage of the same device driver
within the host and guest.

This series introduces IMS support to VFIO PCI for use by
virtual devices that support MSI-X interrupts that are backed by IMS
interrupts on the host. Specifically, that means that when the virtual
device's VFIO_DEVICE_SET_IRQS ioctl() receives a "trigger interrupt"
(VFIO_IRQ_SET_ACTION_TRIGGER) for a MSI-X index then VFIO PCI IMS
allocates/frees an IMS interrupt on the host.

VFIO PCI assumes that it is managing interrupts of a passthrough PCI
device. Split VFIO PCI into a "frontend" and "backend" to support
interrupt management for virtual devices that are not passthrough PCI
devices. The VFIO PCI frontend directs guest requests to the
appropriate backend. Existing interrupt management for passthrough PCI
devices is the first backend, guest MSI-X interrupts backed by
IMS interrupts on the host is the new backend (VFIO PCI IMS).

An IMS interrupt is allocated via pci_ims_alloc_irq() that requires
an implementation specific cookie that is opaque to VFIO PCI IMS. This
can be a PASID, queue ID, pointer etc. During initialization
VFIO PCI IMS learns which PCI device to operate on and what the
default cookie should be for any new interrupt allocation. VFIO PCI
IMS can also associate a unique cookie with each vector.

Guests may access a virtual device via both 'direct-path', where the
guest interacts directly with the underlying hardware, and 'intercepted
path', where the virtual device emulates operations. VFIO PCI
supports emulated interrupts (better naming suggestions are welcome) to
handle 'intercepted path' operations where completion interrupts are
signaled from the virtual device, not the underlying hardware.

This has been tested with a yet to be published VFIO driver for the
Intel Data Accelerators (IDXD) present in Intel Xeon CPUs.

While this series contains a working implementation it is presented
as an RFC with the goal to obtain feedback on whether VFIO PCI IMS
is appropriate for inclusion into VFIO and whether it is
(or could be adapted to be) appropriate for support of other
planned IMS usages you may be aware of.

Any feedback will be greatly appreciated.

Reinette

Reinette Chatre (26):
PCI/MSI: Provide stubs for IMS functions
vfio/pci: Move PCI specific check from wrapper to PCI function
vfio/pci: Use unsigned int instead of unsigned
vfio/pci: Make core interrupt callbacks accessible to all virtual
devices
vfio/pci: Split PCI interrupt management into front and backend
vfio/pci: Separate MSI and MSI-X handling
vfio/pci: Move interrupt eventfd to interrupt context
vfio/pci: Move mutex acquisition into function
vfio/pci: Move per-interrupt contexts to generic interrupt struct
vfio/pci: Move IRQ type to generic interrupt context
vfio/pci: Provide interrupt context to irq_is() and is_irq_none()
vfio/pci: Provide interrupt context to generic ops
vfio/pci: Provide interrupt context to vfio_msi_enable() and
vfio_msi_disable()
vfio/pci: Let interrupt management backend interpret interrupt index
vfio/pci: Move generic code to frontend
vfio/pci: Split interrupt context initialization
vfio/pci: Make vfio_pci_set_irqs_ioctl() available
vfio/pci: Preserve per-interrupt contexts
vfio/pci: Store Linux IRQ number in per-interrupt context
vfio/pci: Separate frontend and backend code during interrupt
enable/disable
vfio/pci: Replace backend specific calls with callbacks
vfio/pci: Introduce backend specific context initializer
vfio/pci: Support emulated interrupts
vfio/pci: Add core IMS support
vfio/pci: Add accessor for IMS index
vfio/pci: Support IMS cookie modification

drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/pci/vfio_pci_core.c | 50 +-
drivers/vfio/pci/vfio_pci_intrs.c | 758 ++++++++++++++++++++++++-----
drivers/vfio/pci/vfio_pci_priv.h | 2 +-
include/linux/pci.h | 34 +-
include/linux/vfio_pci_core.h | 87 +++-
6 files changed, 756 insertions(+), 177 deletions(-)


base-commit: 611da07b89fdd53f140d7b33013f255bf0ed8f34
--
2.34.1


2023-10-27 17:03:36

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 06/26] vfio/pci: Separate MSI and MSI-X handling

VFIO PCI interrupt management uses a single entry for both
MSI and MSI-X management with the called functions using a boolean
when necessary to distinguish between MSI and MSI-X. This remains
unchanged.

Virtual device interrupt management should not be required to
use the same callback for both MSI and MSI-X. It may be possible
for a virtual device to not support MSI at all and only
provide MSI-X interrupt management.

Separate the MSI and MSI-X interrupt management by allowing
different callbacks for each interrupt type. For PCI devices
the callback remains the same.

Signed-off-by: Reinette Chatre <[email protected]>
---
No changes since RFC V2.

drivers/vfio/pci/vfio_pci_intrs.c | 14 +++++++++++++-
include/linux/vfio_pci_core.h | 3 +++
2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 96587acfebf0..7de906363402 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -801,6 +801,7 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
.set_intx_unmask = vfio_pci_set_intx_unmask,
.set_intx_trigger = vfio_pci_set_intx_trigger,
.set_msi_trigger = vfio_pci_set_msi_trigger,
+ .set_msix_trigger = vfio_pci_set_msi_trigger,
.set_err_trigger = vfio_pci_set_err_trigger,
.set_req_trigger = vfio_pci_set_req_trigger,
};
@@ -839,7 +840,6 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
}
break;
case VFIO_PCI_MSI_IRQ_INDEX:
- case VFIO_PCI_MSIX_IRQ_INDEX:
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
case VFIO_IRQ_SET_ACTION_MASK:
case VFIO_IRQ_SET_ACTION_UNMASK:
@@ -851,6 +851,18 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
break;
}
break;
+ case VFIO_PCI_MSIX_IRQ_INDEX:
+ switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+ case VFIO_IRQ_SET_ACTION_MASK:
+ case VFIO_IRQ_SET_ACTION_UNMASK:
+ /* XXX Need masking support exported */
+ break;
+ case VFIO_IRQ_SET_ACTION_TRIGGER:
+ if (intr_ctx->ops->set_msix_trigger)
+ func = intr_ctx->ops->set_msix_trigger;
+ break;
+ }
+ break;
case VFIO_PCI_ERR_IRQ_INDEX:
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
case VFIO_IRQ_SET_ACTION_TRIGGER:
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index d3fa0e49a1a8..db7ee9517d94 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -72,6 +72,9 @@ struct vfio_pci_intr_ops {
int (*set_msi_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags, void *data);
+ int (*set_msix_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags, void *data);
int (*set_err_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags, void *data);
--
2.34.1

2023-10-27 17:03:39

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 15/26] vfio/pci: Move generic code to frontend

vfio_pci_set_msi_trigger() and vfio_msi_set_block() are generic
and thus appropriate to be frontend code. This means that they
should operate on the interrupt context, not backend specific
data.

Provide the interrupt context as parameter to vfio_pci_set_msi_trigger()
and vfio_msi_set_block() and remove all references to the PCI interrupt
management data from these functions. This enables these functions to
form part of the interrupt management frontend shared by different
interrupt management backends.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- New patch

drivers/vfio/pci/vfio_pci_intrs.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index d2b80e176651..adad93c31ac6 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -415,18 +415,19 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
return map.index < 0 ? map.index : map.virq;
}

-static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
+static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int vector, int fd,
unsigned int index)
{
bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
int irq = -EINVAL, ret;
u16 cmd;

- ctx = vfio_irq_ctx_get(&vdev->intr_ctx, vector);
+ ctx = vfio_irq_ctx_get(intr_ctx, vector);

if (ctx) {
irq_bypass_unregister_producer(&ctx->producer);
@@ -437,7 +438,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
/* Interrupt stays allocated, will be freed at MSI-X disable. */
kfree(ctx->name);
eventfd_ctx_put(ctx->trigger);
- vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
+ vfio_irq_ctx_free(intr_ctx, ctx, vector);
}

if (fd < 0)
@@ -450,7 +451,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
return irq;
}

- ctx = vfio_irq_ctx_alloc(&vdev->intr_ctx, vector);
+ ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
if (!ctx)
return -ENOMEM;

@@ -504,11 +505,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
out_free_name:
kfree(ctx->name);
out_free_ctx:
- vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
+ vfio_irq_ctx_free(intr_ctx, ctx, vector);
return ret;
}

-static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
+static int vfio_msi_set_block(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int start, unsigned int count,
int32_t *fds, unsigned int index)
{
@@ -517,12 +518,12 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,

for (i = 0, j = start; i < count && !ret; i++, j++) {
int fd = fds ? fds[i] : -1;
- ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
+ ret = vfio_msi_set_vector_signal(intr_ctx, j, fd, index);
}

if (ret) {
for (i = start; i < j; i++)
- vfio_msi_set_vector_signal(vdev, i, -1, index);
+ vfio_msi_set_vector_signal(intr_ctx, i, -1, index);
}

return ret;
@@ -540,7 +541,7 @@ static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx,
xa_for_each(&intr_ctx->ctx, i, ctx) {
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
- vfio_msi_set_vector_signal(vdev, i, -1, index);
+ vfio_msi_set_vector_signal(intr_ctx, i, -1, index);
}

cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -668,7 +669,6 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int count, uint32_t flags,
void *data)
{
- struct vfio_pci_core_device *vdev = intr_ctx->priv;
struct vfio_pci_irq_ctx *ctx;
unsigned int i;

@@ -684,15 +684,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
int32_t *fds = data;
int ret;

- if (vdev->intr_ctx.irq_type == index)
- return vfio_msi_set_block(vdev, start, count,
+ if (intr_ctx->irq_type == index)
+ return vfio_msi_set_block(intr_ctx, start, count,
fds, index);

ret = vfio_msi_enable(intr_ctx, start + count, index);
if (ret)
return ret;

- ret = vfio_msi_set_block(vdev, start, count, fds, index);
+ ret = vfio_msi_set_block(intr_ctx, start, count, fds, index);
if (ret)
vfio_msi_disable(intr_ctx, index);

@@ -703,7 +703,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
return -EINVAL;

for (i = start; i < start + count; i++) {
- ctx = vfio_irq_ctx_get(&vdev->intr_ctx, i);
+ ctx = vfio_irq_ctx_get(intr_ctx, i);
if (!ctx)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
--
2.34.1

2023-10-27 17:03:58

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 16/26] vfio/pci: Split interrupt context initialization

struct vfio_pci_intr_ctx is the context associated with interrupts
of a virtual device. The interrupt context is initialized with
backend specific data required by the particular interrupt management
backend as well as common initialization required by all interrupt
management backends.

Split interrupt context initialization into common and interrupt
management backend specific calls. The entrypoint will be the
initialization of a particular interrupt management backend which
in turn calls the common initialization.

Signed-off-by: Reinette Chatre <[email protected]>
---
No changes since RFC V2.

drivers/vfio/pci/vfio_pci_intrs.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index adad93c31ac6..14131d5288e3 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -801,6 +801,18 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_intr_ctx *intr_ctx,
count, flags, data);
}

+static void _vfio_pci_init_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
+{
+ intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
+ mutex_init(&intr_ctx->igate);
+ xa_init(&intr_ctx->ctx);
+}
+
+static void _vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
+{
+ mutex_destroy(&intr_ctx->igate);
+}
+
static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
.set_intx_mask = vfio_pci_set_intx_mask,
.set_intx_unmask = vfio_pci_set_intx_unmask,
@@ -814,17 +826,15 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
struct vfio_pci_intr_ctx *intr_ctx)
{
- intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
+ _vfio_pci_init_intr_ctx(intr_ctx);
intr_ctx->ops = &vfio_pci_intr_ops;
intr_ctx->priv = vdev;
- mutex_init(&intr_ctx->igate);
- xa_init(&intr_ctx->ctx);
}
EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);

void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
{
- mutex_destroy(&intr_ctx->igate);
+ _vfio_pci_release_intr_ctx(intr_ctx);
}
EXPORT_SYMBOL_GPL(vfio_pci_release_intr_ctx);

--
2.34.1

2023-10-27 17:04:01

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 26/26] vfio/pci: Support IMS cookie modification

IMS supports an implementation specific cookie that is associated
with each interrupt. By default the IMS interrupt allocation backend
will assign a default cookie to a new interrupt instance.

Add support for a virtual device driver to set the interrupt instance
specific cookie. For example, the virtual device driver may intercept
the guest's MMIO write that configuresa a new PASID for a particular
interrupt. Calling vfio_pci_ims_set_cookie() with the new PASID value
as IMS cookie enables subsequent interrupts to be allocated with
accurate data.

Signed-off-by: Reinette Chatre <[email protected]>
---
No changes since RFC V2.

drivers/vfio/pci/vfio_pci_intrs.c | 53 +++++++++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 3 ++
2 files changed, 56 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 32ebc8fec4c4..5dc22dd9390e 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -1188,6 +1188,59 @@ int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector)
}
EXPORT_SYMBOL_GPL(vfio_pci_ims_hwirq);

+/*
+ * vfio_pci_ims_set_cookie() - Set unique cookie for vector.
+ * @intr_ctx: Interrupt context.
+ * @vector: Vector.
+ * @icookie: New cookie for @vector.
+ *
+ * When new IMS interrupt is allocated for @vector it will be
+ * assigned @icookie.
+ */
+int vfio_pci_ims_set_cookie(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int vector,
+ union msi_instance_cookie *icookie)
+{
+ struct vfio_pci_irq_ctx *ctx;
+ int ret = -EINVAL;
+
+ mutex_lock(&intr_ctx->igate);
+
+ if (!intr_ctx->ims_backed_irq)
+ goto out_unlock;
+
+ ctx = vfio_irq_ctx_get(intr_ctx, vector);
+ if (ctx) {
+ if (WARN_ON_ONCE(ctx->emulated)) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+ ctx->icookie = *icookie;
+ ret = 0;
+ goto out_unlock;
+ }
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+ if (!ctx) {
+ ret = -ENOMEM;
+ goto out_unlock;
+ }
+
+ ctx->icookie = *icookie;
+ ret = xa_insert(&intr_ctx->ctx, vector, ctx, GFP_KERNEL_ACCOUNT);
+ if (ret) {
+ kfree(ctx);
+ goto out_unlock;
+ }
+
+ ret = 0;
+
+out_unlock:
+ mutex_unlock(&intr_ctx->igate);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_set_cookie);
+
int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index dbc77839ef26..b989b533e852 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -181,6 +181,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data);
int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
+int vfio_pci_ims_set_cookie(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int vector,
+ union msi_instance_cookie *icookie);
void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int start, unsigned int count);
--
2.34.1

2023-10-27 17:04:13

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 22/26] vfio/pci: Introduce backend specific context initializer

The per-interrupt context may contain backend specific data.

Call a backend provided initializer on per-interrupt context
creation.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- New patch

drivers/vfio/pci/vfio_pci_intrs.c | 8 ++++++++
include/linux/vfio_pci_core.h | 2 ++
2 files changed, 10 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1e6376b048de..8c86f2d6229f 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -73,6 +73,14 @@ vfio_irq_ctx_alloc(struct vfio_pci_intr_ctx *intr_ctx, unsigned long index)
if (!ctx)
return NULL;

+ if (intr_ctx->ops->init_irq_ctx) {
+ ret = intr_ctx->ops->init_irq_ctx(intr_ctx, ctx);
+ if (ret < 0) {
+ kfree(ctx);
+ return NULL;
+ }
+ }
+
ret = xa_insert(&intr_ctx->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
if (ret) {
kfree(ctx);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index f0951084a26f..d5140a732741 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -106,6 +106,8 @@ struct vfio_pci_intr_ops {
unsigned int vector);
char *(*msi_device_name)(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int vector, unsigned int index);
+ int (*init_irq_ctx)(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx);
};

struct vfio_pci_core_device {
--
2.34.1

2023-10-27 17:04:23

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 04/26] vfio/pci: Make core interrupt callbacks accessible to all virtual devices

The functions handling actions on interrupts for a virtual PCI device
triggered by VFIO_DEVICE_SET_IRQS ioctl() expect to act on a passthrough
PCI device represented by a struct vfio_pci_core_device.

A virtual device can support MSI-X while not being a passthrough PCI
device and thus not be represented by a struct vfio_pci_core_device.

To support MSI-X in virtual devices it needs to be possible for
their drivers to interact with the MSI-X interrupt management and
thus the interrupt management should not require struct
vfio_pci_core_device.

Introduce struct vfio_pci_intr_ctx that will contain a virtual device's
interrupt context to be managed by an interrupt management backend.
The first supported backend is the existing PCI device interrupt
management. Modify the core VFIO PCI interrupt management functions
to expect this structure. As a backend managing interrupts of
passthrough PCI devices the existing VFIO PCI functions continue to
operate on an actual PCI device represented by
struct vfio_pci_core_device that is provided via a private pointer.

More members will be added to struct vfio_pci_intr_ctx as members
unique to interrupt context are transitioned from struct
vfio_pci_core_device.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- Improve changelog and comments.

drivers/vfio/pci/vfio_pci_core.c | 7 ++++---
drivers/vfio/pci/vfio_pci_intrs.c | 29 ++++++++++++++++++++---------
drivers/vfio/pci/vfio_pci_priv.h | 2 +-
include/linux/vfio_pci_core.h | 9 +++++++++
4 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..bb8181444c41 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -594,7 +594,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
/* Stop the device from further DMA */
pci_clear_master(pdev);

- vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+ vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, VFIO_IRQ_SET_DATA_NONE |
VFIO_IRQ_SET_ACTION_TRIGGER,
vdev->irq_type, 0, 0, NULL);

@@ -1216,8 +1216,8 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,

mutex_lock(&vdev->igate);

- ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, hdr.start,
- hdr.count, data);
+ ret = vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, hdr.flags, hdr.index,
+ hdr.start, hdr.count, data);

mutex_unlock(&vdev->igate);
kfree(data);
@@ -2166,6 +2166,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
init_rwsem(&vdev->memory_lock);
xa_init(&vdev->ctx);
+ vdev->intr_ctx.priv = vdev;

return 0;
}
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 9f4f3ab48f87..af1053873eaa 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -553,11 +553,13 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
/*
* IOCTL support
*/
-static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_intx_unmask(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags,
void *data)
{
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
if (!is_intx(vdev) || start != 0 || count != 1)
return -EINVAL;

@@ -585,10 +587,12 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
return 0;
}

-static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_intx_mask(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags, void *data)
{
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
if (!is_intx(vdev) || start != 0 || count != 1)
return -EINVAL;

@@ -605,11 +609,13 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
return 0;
}

-static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_intx_trigger(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags,
void *data)
{
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
vfio_intx_disable(vdev);
return 0;
@@ -649,11 +655,12 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
return 0;
}

-static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags,
void *data)
{
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
struct vfio_pci_irq_ctx *ctx;
unsigned int i;
bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
@@ -758,11 +765,13 @@ static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx,
return -EINVAL;
}

-static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_err_trigger(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags,
void *data)
{
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
if (!pci_is_pcie(vdev->pdev))
return -ENOTTY;

@@ -773,11 +782,13 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_core_device *vdev,
count, flags, data);
}

-static int vfio_pci_set_req_trigger(struct vfio_pci_core_device *vdev,
+static int vfio_pci_set_req_trigger(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags,
void *data)
{
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
+
if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1)
return -EINVAL;

@@ -785,11 +796,11 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_core_device *vdev,
count, flags, data);
}

-int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data)
{
- int (*func)(struct vfio_pci_core_device *vdev, unsigned int index,
+ int (*func)(struct vfio_pci_intr_ctx *intr_ctx, unsigned int index,
unsigned int start, unsigned int count, uint32_t flags,
void *data) = NULL;

@@ -838,5 +849,5 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
if (!func)
return -ENOTTY;

- return func(vdev, index, start, count, flags, data);
+ return func(intr_ctx, index, start, count, flags, data);
}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 5e4fa69aee16..6dddcfe7ab19 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -26,7 +26,7 @@ struct vfio_pci_ioeventfd {
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev);

-int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned index, unsigned start, unsigned count,
void *data);

diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 562e8754869d..38355a4817fd 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -49,6 +49,14 @@ struct vfio_pci_region {
u32 flags;
};

+/*
+ * Interrupt context of virtual PCI device
+ * @priv: Private data of interrupt management backend
+ */
+struct vfio_pci_intr_ctx {
+ void *priv;
+};
+
struct vfio_pci_core_device {
struct vfio_device vdev;
struct pci_dev *pdev;
@@ -96,6 +104,7 @@ struct vfio_pci_core_device {
struct mutex vma_lock;
struct list_head vma_list;
struct rw_semaphore memory_lock;
+ struct vfio_pci_intr_ctx intr_ctx;
};

/* Will be exported for vfio pci drivers usage */
--
2.34.1

2023-10-27 17:04:23

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 10/26] vfio/pci: Move IRQ type to generic interrupt context

The type of interrupts within the guest is not unique to
PCI devices and needed for other virtual devices supporting
interrupts.

Move interrupt type to the generic interrupt context struct
vfio_pci_intr_ctx.

Signed-off-by: Reinette Chatre <[email protected]>
---
Question for maintainers:
irq_type is accessed in ioctl() flow as well as other
flows. It is not clear to me how it is protected against
concurrent access. Should accesses outside of ioctl() flow
take the mutex?

No changes since RFC V2.

drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/pci/vfio_pci_core.c | 5 ++---
drivers/vfio/pci/vfio_pci_intrs.c | 21 +++++++++++----------
include/linux/vfio_pci_core.h | 3 ++-
4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 7e2e62ab0869..2535bdbc016d 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1168,7 +1168,7 @@ static int vfio_msi_config_write(struct vfio_pci_core_device *vdev, int pos,
flags = le16_to_cpu(*pflags);

/* MSI is enabled via ioctl */
- if (vdev->irq_type != VFIO_PCI_MSI_IRQ_INDEX)
+ if (vdev->intr_ctx.irq_type != VFIO_PCI_MSI_IRQ_INDEX)
flags &= ~PCI_MSI_FLAGS_ENABLE;

/* Check queue size */
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index cf303a9555f0..34109ed38454 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -427,7 +427,7 @@ static int vfio_pci_core_runtime_suspend(struct device *dev)
* vfio_pci_intx_mask() will return false and in that case, INTx
* should not be unmasked in the runtime resume.
*/
- vdev->pm_intx_masked = ((vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) &&
+ vdev->pm_intx_masked = ((vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX) &&
vfio_pci_intx_mask(vdev));

return 0;
@@ -596,7 +596,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)

vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, VFIO_IRQ_SET_DATA_NONE |
VFIO_IRQ_SET_ACTION_TRIGGER,
- vdev->irq_type, 0, 0, NULL);
+ vdev->intr_ctx.irq_type, 0, 0, NULL);

/* Device closed, don't need mutex here */
list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
@@ -2153,7 +2153,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
container_of(core_vdev, struct vfio_pci_core_device, vdev);

vdev->pdev = to_pci_dev(core_vdev->dev);
- vdev->irq_type = VFIO_PCI_NUM_IRQS;
spin_lock_init(&vdev->irqlock);
mutex_init(&vdev->ioeventfds_lock);
INIT_LIST_HEAD(&vdev->dummy_resources_list);
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 3cfd7fdec87b..858795ba50fe 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -33,19 +33,19 @@ struct vfio_pci_irq_ctx {

static bool irq_is(struct vfio_pci_core_device *vdev, int type)
{
- return vdev->irq_type == type;
+ return vdev->intr_ctx.irq_type == type;
}

static bool is_intx(struct vfio_pci_core_device *vdev)
{
- return vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX;
+ return vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX;
}

static bool is_irq_none(struct vfio_pci_core_device *vdev)
{
- return !(vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX ||
- vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
- vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
+ return !(vdev->intr_ctx.irq_type == VFIO_PCI_INTX_IRQ_INDEX ||
+ vdev->intr_ctx.irq_type == VFIO_PCI_MSI_IRQ_INDEX ||
+ vdev->intr_ctx.irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
}

static
@@ -255,7 +255,7 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
if (vdev->pci_2_3)
pci_intx(vdev->pdev, !ctx->masked);

- vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
+ vdev->intr_ctx.irq_type = VFIO_PCI_INTX_IRQ_INDEX;

return 0;
}
@@ -331,7 +331,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
vfio_virqfd_disable(&ctx->mask);
}
vfio_intx_set_signal(vdev, -1);
- vdev->irq_type = VFIO_PCI_NUM_IRQS;
+ vdev->intr_ctx.irq_type = VFIO_PCI_NUM_IRQS;
vfio_irq_ctx_free(vdev, ctx, 0);
}

@@ -367,7 +367,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
}
vfio_pci_memory_unlock_and_restore(vdev, cmd);

- vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
+ vdev->intr_ctx.irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
VFIO_PCI_MSI_IRQ_INDEX;

if (!msix) {
@@ -547,7 +547,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
if (vdev->nointx)
pci_intx(pdev, 0);

- vdev->irq_type = VFIO_PCI_NUM_IRQS;
+ vdev->intr_ctx.irq_type = VFIO_PCI_NUM_IRQS;
}

/*
@@ -677,7 +677,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
int32_t *fds = data;
int ret;

- if (vdev->irq_type == index)
+ if (vdev->intr_ctx.irq_type == index)
return vfio_msi_set_block(vdev, start, count,
fds, msix);

@@ -807,6 +807,7 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
struct vfio_pci_intr_ctx *intr_ctx)
{
+ intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
intr_ctx->ops = &vfio_pci_intr_ops;
intr_ctx->priv = vdev;
mutex_init(&intr_ctx->igate);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 0f9df87aedd9..e666c19da223 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -57,6 +57,7 @@ struct vfio_pci_region {
* @err_trigger: Eventfd associated with error reporting IRQ
* @req_trigger: Eventfd associated with device request notification
* @ctx: Per-interrupt context indexed by vector
+ * @irq_type: Type of interrupt from guest perspective
*/
struct vfio_pci_intr_ctx {
const struct vfio_pci_intr_ops *ops;
@@ -65,6 +66,7 @@ struct vfio_pci_intr_ctx {
struct eventfd_ctx *err_trigger;
struct eventfd_ctx *req_trigger;
struct xarray ctx;
+ int irq_type;
};

struct vfio_pci_intr_ops {
@@ -100,7 +102,6 @@ struct vfio_pci_core_device {
u8 *vconfig;
struct perm_bits *msi_perm;
spinlock_t irqlock;
- int irq_type;
int num_regions;
struct vfio_pci_region *region;
u8 msi_qmax;
--
2.34.1

2023-10-27 17:05:11

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 08/26] vfio/pci: Move mutex acquisition into function

vfio_pci_set_irqs_ioctl() is the entrypoint for interrupt management
via the VFIO_DEVICE_SET_IRQS ioctl(). vfio_pci_set_irqs_ioctl() can
be called from a virtual device driver after its callbacks have been
configured to support the needed interrupt management.

The igate mutex is obtained before vfio_pci_set_irqs_ioctl() to protect
against concurrent changes to interrupt context. It should not be
necessary for all users of vfio_pci_set_irqs_ioctl() to remember to
take the mutex.

Acquire and release the mutex within vfio_pci_set_irqs_ioctl().

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- Improve changelog.

drivers/vfio/pci/vfio_pci_core.c | 2 --
drivers/vfio/pci/vfio_pci_intrs.c | 10 ++++++++--
2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 5c9bd5d2db53..bf4de137ad2f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1214,12 +1214,10 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
return PTR_ERR(data);
}

- mutex_lock(&vdev->intr_ctx.igate);

ret = vfio_pci_set_irqs_ioctl(&vdev->intr_ctx, hdr.flags, hdr.index,
hdr.start, hdr.count, data);

- mutex_unlock(&vdev->intr_ctx.igate);
kfree(data);

return ret;
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index a4c8b589c87b..5d600548b5d7 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -826,7 +826,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
int (*func)(struct vfio_pci_intr_ctx *intr_ctx, unsigned int index,
unsigned int start, unsigned int count, uint32_t flags,
void *data) = NULL;
+ int ret = -ENOTTY;

+ mutex_lock(&intr_ctx->igate);
switch (index) {
case VFIO_PCI_INTX_IRQ_INDEX:
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
@@ -887,7 +889,11 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
}

if (!func)
- return -ENOTTY;
+ goto out_unlock;
+
+ ret = func(intr_ctx, index, start, count, flags, data);

- return func(intr_ctx, index, start, count, flags, data);
+out_unlock:
+ mutex_unlock(&intr_ctx->igate);
+ return ret;
}
--
2.34.1

2023-10-27 17:05:26

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 09/26] vfio/pci: Move per-interrupt contexts to generic interrupt struct

VFIO PCI interrupt management maintains per-interrupt context within an
xarray using the interrupt vector as index.

Move the per-interrupt context to the generic interrupt context in
struct vfio_pci_intr_ctx to enable the per-interrupt context to be
managed by different backends.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- Improve changelog.

drivers/vfio/pci/vfio_pci_core.c | 1 -
drivers/vfio/pci/vfio_pci_intrs.c | 9 +++++----
include/linux/vfio_pci_core.h | 3 ++-
3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index bf4de137ad2f..cf303a9555f0 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2162,7 +2162,6 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
INIT_LIST_HEAD(&vdev->vma_list);
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
init_rwsem(&vdev->memory_lock);
- xa_init(&vdev->ctx);
vfio_pci_init_intr_ctx(vdev, &vdev->intr_ctx);

return 0;
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 5d600548b5d7..3cfd7fdec87b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -52,13 +52,13 @@ static
struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
unsigned long index)
{
- return xa_load(&vdev->ctx, index);
+ return xa_load(&vdev->intr_ctx.ctx, index);
}

static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
struct vfio_pci_irq_ctx *ctx, unsigned long index)
{
- xa_erase(&vdev->ctx, index);
+ xa_erase(&vdev->intr_ctx.ctx, index);
kfree(ctx);
}

@@ -72,7 +72,7 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
if (!ctx)
return NULL;

- ret = xa_insert(&vdev->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
+ ret = xa_insert(&vdev->intr_ctx.ctx, index, ctx, GFP_KERNEL_ACCOUNT);
if (ret) {
kfree(ctx);
return NULL;
@@ -530,7 +530,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
unsigned long i;
u16 cmd;

- xa_for_each(&vdev->ctx, i, ctx) {
+ xa_for_each(&vdev->intr_ctx.ctx, i, ctx) {
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
vfio_msi_set_vector_signal(vdev, i, -1, msix);
@@ -810,6 +810,7 @@ void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
intr_ctx->ops = &vfio_pci_intr_ops;
intr_ctx->priv = vdev;
mutex_init(&intr_ctx->igate);
+ xa_init(&intr_ctx->ctx);
}
EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);

diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 1eb5842cff11..0f9df87aedd9 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -56,6 +56,7 @@ struct vfio_pci_region {
* @igate: Protects members of struct vfio_pci_intr_ctx
* @err_trigger: Eventfd associated with error reporting IRQ
* @req_trigger: Eventfd associated with device request notification
+ * @ctx: Per-interrupt context indexed by vector
*/
struct vfio_pci_intr_ctx {
const struct vfio_pci_intr_ops *ops;
@@ -63,6 +64,7 @@ struct vfio_pci_intr_ctx {
struct mutex igate;
struct eventfd_ctx *err_trigger;
struct eventfd_ctx *req_trigger;
+ struct xarray ctx;
};

struct vfio_pci_intr_ops {
@@ -98,7 +100,6 @@ struct vfio_pci_core_device {
u8 *vconfig;
struct perm_bits *msi_perm;
spinlock_t irqlock;
- struct xarray ctx;
int irq_type;
int num_regions;
struct vfio_pci_region *region;
--
2.34.1

2023-10-27 17:06:00

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 14/26] vfio/pci: Let interrupt management backend interpret interrupt index

vfio_pci_set_msi_trigger() and vfio_msi_set_block() are generic
and can be shared by different interrupt backends. This implies
that these functions should not interpret user provided parameters
but instead pass them to the backend specific code for interpretation.

Instead of assuming that only MSI or MSI-X can be provided via the
index and passing a boolean based on what was received, pass the
actual index to backend for interpretation.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- New patch

drivers/vfio/pci/vfio_pci_intrs.c | 38 +++++++++++++++++--------------
1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index ad3f9c1baccc..d2b80e176651 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -346,17 +346,20 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
return IRQ_HANDLED;
}

-static int vfio_msi_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec, bool msix)
+static int vfio_msi_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec,
+ unsigned int index)
{
struct vfio_pci_core_device *vdev = intr_ctx->priv;
struct pci_dev *pdev = vdev->pdev;
- unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
+ unsigned int flag;
int ret;
u16 cmd;

if (!is_irq_none(intr_ctx))
return -EINVAL;

+ flag = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
+
/* return the number of supported vectors if we can't get all: */
cmd = vfio_pci_memory_lock_and_enable(vdev);
ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
@@ -368,10 +371,9 @@ static int vfio_msi_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec, bool ms
}
vfio_pci_memory_unlock_and_restore(vdev, cmd);

- intr_ctx->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
- VFIO_PCI_MSI_IRQ_INDEX;
+ intr_ctx->irq_type = index;

- if (!msix) {
+ if (index == VFIO_PCI_MSI_IRQ_INDEX) {
/*
* Compute the virtual hardware field for max msi vectors -
* it is the log base 2 of the number of vectors.
@@ -414,8 +416,10 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
}

static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
- unsigned int vector, int fd, bool msix)
+ unsigned int vector, int fd,
+ unsigned int index)
{
+ bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
@@ -506,25 +510,26 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,

static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
unsigned int start, unsigned int count,
- int32_t *fds, bool msix)
+ int32_t *fds, unsigned int index)
{
unsigned int i, j;
int ret = 0;

for (i = 0, j = start; i < count && !ret; i++, j++) {
int fd = fds ? fds[i] : -1;
- ret = vfio_msi_set_vector_signal(vdev, j, fd, msix);
+ ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
}

if (ret) {
for (i = start; i < j; i++)
- vfio_msi_set_vector_signal(vdev, i, -1, msix);
+ vfio_msi_set_vector_signal(vdev, i, -1, index);
}

return ret;
}

-static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx, bool msix)
+static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int index)
{
struct vfio_pci_core_device *vdev = intr_ctx->priv;
struct pci_dev *pdev = vdev->pdev;
@@ -535,7 +540,7 @@ static void vfio_msi_disable(struct vfio_pci_intr_ctx *intr_ctx, bool msix)
xa_for_each(&intr_ctx->ctx, i, ctx) {
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
- vfio_msi_set_vector_signal(vdev, i, -1, msix);
+ vfio_msi_set_vector_signal(vdev, i, -1, index);
}

cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -666,10 +671,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
struct vfio_pci_core_device *vdev = intr_ctx->priv;
struct vfio_pci_irq_ctx *ctx;
unsigned int i;
- bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;

if (irq_is(intr_ctx, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
- vfio_msi_disable(intr_ctx, msix);
+ vfio_msi_disable(intr_ctx, index);
return 0;
}

@@ -682,15 +686,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,

if (vdev->intr_ctx.irq_type == index)
return vfio_msi_set_block(vdev, start, count,
- fds, msix);
+ fds, index);

- ret = vfio_msi_enable(intr_ctx, start + count, msix);
+ ret = vfio_msi_enable(intr_ctx, start + count, index);
if (ret)
return ret;

- ret = vfio_msi_set_block(vdev, start, count, fds, msix);
+ ret = vfio_msi_set_block(vdev, start, count, fds, index);
if (ret)
- vfio_msi_disable(intr_ctx, msix);
+ vfio_msi_disable(intr_ctx, index);

return ret;
}
--
2.34.1

2023-10-27 17:06:16

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 17/26] vfio/pci: Make vfio_pci_set_irqs_ioctl() available

vfio_pci_set_irqs_ioctl() is now a generic entrypoint that can
be configured to support different interrupt management backend.

Export vfio_pci_set_irqs_ioctl() for use by other virtual device
drivers.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- Improve changelog.

drivers/vfio/pci/vfio_pci_intrs.c | 1 +
include/linux/vfio_pci_core.h | 3 +++
2 files changed, 4 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 14131d5288e3..80040fde6f6b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -916,3 +916,4 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
mutex_unlock(&intr_ctx->igate);
return ret;
}
+EXPORT_SYMBOL_GPL(vfio_pci_set_irqs_ioctl);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index e666c19da223..8d2fb51a2dcc 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -158,6 +158,9 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
struct vfio_pci_intr_ctx *intr_ctx);
void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
+ unsigned int index, unsigned int start,
+ unsigned int count, void *data);
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz);
ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
--
2.34.1

2023-10-27 17:06:42

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 19/26] vfio/pci: Store Linux IRQ number in per-interrupt context

The Linux IRQ number is a property shared among all interrupt
backends but not all interrupt management backends have a simple
query for it. pci_irq_vector() can be used to obtain the Linux
IRQ number of a MSI-X interrupt but there is no such
query for IMS interrupts.

The Linux IRQ number is needed during interrupt free as well
as during register of IRQ bypass producer. It is unnecessary to
query the Linux IRQ number at each stage, the number can be
stored at the time the interrupt is allocated and obtained
from its per-interrupt context when needed.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- New patch

drivers/vfio/pci/vfio_pci_intrs.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8d84e7d62594..fd0713dc9f81 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -29,6 +29,7 @@ struct vfio_pci_irq_ctx {
char *name;
bool masked;
struct irq_bypass_producer producer;
+ int virq;
};

static bool irq_is(struct vfio_pci_intr_ctx *intr_ctx, int type)
@@ -431,10 +432,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,

if (ctx && ctx->trigger) {
irq_bypass_unregister_producer(&ctx->producer);
- irq = pci_irq_vector(pdev, vector);
+ irq = ctx->virq;
cmd = vfio_pci_memory_lock_and_enable(vdev);
- free_irq(irq, ctx->trigger);
+ free_irq(ctx->virq, ctx->trigger);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
+ ctx->virq = 0;
/* Interrupt stays allocated, will be freed at MSI-X disable. */
kfree(ctx->name);
ctx->name = NULL;
@@ -488,8 +490,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
if (ret)
goto out_put_eventfd_ctx;

+ ctx->virq = irq;
+
ctx->producer.token = trigger;
- ctx->producer.irq = irq;
+ ctx->producer.irq = ctx->virq;
ret = irq_bypass_register_producer(&ctx->producer);
if (unlikely(ret)) {
dev_info(&pdev->dev,
--
2.34.1

2023-10-27 17:07:05

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 12/26] vfio/pci: Provide interrupt context to generic ops

The functions operating on the per-interrupt context were originally
created to support management of PCI device interrupts where the
interrupt context was maintained within the virtual PCI device's
struct vfio_pci_core_device. Now that the per-interrupt context
has been moved to a more generic struct vfio_pci_intr_ctx these utilities
can be changed to expect the generic structure instead. This enables
these utilities to be used in other interrupt management backends.

Signed-off-by: Reinette Chatre <[email protected]>
---
No changes since RFC V2.

drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++++++++++---------------
1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 9aff5c38f198..cdb6f875271f 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -49,21 +49,21 @@ static bool is_irq_none(struct vfio_pci_intr_ctx *intr_ctx)
}

static
-struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
+struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_intr_ctx *intr_ctx,
unsigned long index)
{
- return xa_load(&vdev->intr_ctx.ctx, index);
+ return xa_load(&intr_ctx->ctx, index);
}

-static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
+static void vfio_irq_ctx_free(struct vfio_pci_intr_ctx *intr_ctx,
struct vfio_pci_irq_ctx *ctx, unsigned long index)
{
- xa_erase(&vdev->intr_ctx.ctx, index);
+ xa_erase(&intr_ctx->ctx, index);
kfree(ctx);
}

static struct vfio_pci_irq_ctx *
-vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
+vfio_irq_ctx_alloc(struct vfio_pci_intr_ctx *intr_ctx, unsigned long index)
{
struct vfio_pci_irq_ctx *ctx;
int ret;
@@ -72,7 +72,7 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
if (!ctx)
return NULL;

- ret = xa_insert(&vdev->intr_ctx.ctx, index, ctx, GFP_KERNEL_ACCOUNT);
+ ret = xa_insert(&intr_ctx->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
if (ret) {
kfree(ctx);
return NULL;
@@ -91,7 +91,7 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
struct vfio_pci_irq_ctx *ctx;

- ctx = vfio_irq_ctx_get(vdev, 0);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
if (WARN_ON_ONCE(!ctx))
return;
eventfd_signal(ctx->trigger, 1);
@@ -120,7 +120,7 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
goto out_unlock;
}

- ctx = vfio_irq_ctx_get(vdev, 0);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
if (WARN_ON_ONCE(!ctx))
goto out_unlock;

@@ -169,7 +169,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
goto out_unlock;
}

- ctx = vfio_irq_ctx_get(vdev, 0);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
if (WARN_ON_ONCE(!ctx))
goto out_unlock;

@@ -207,7 +207,7 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
unsigned long flags;
int ret = IRQ_NONE;

- ctx = vfio_irq_ctx_get(vdev, 0);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
if (WARN_ON_ONCE(!ctx))
return ret;

@@ -241,7 +241,7 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
if (!vdev->pdev->irq)
return -ENODEV;

- ctx = vfio_irq_ctx_alloc(vdev, 0);
+ ctx = vfio_irq_ctx_alloc(&vdev->intr_ctx, 0);
if (!ctx)
return -ENOMEM;

@@ -269,7 +269,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
unsigned long flags;
int ret;

- ctx = vfio_irq_ctx_get(vdev, 0);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
if (WARN_ON_ONCE(!ctx))
return -EINVAL;

@@ -324,7 +324,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
{
struct vfio_pci_irq_ctx *ctx;

- ctx = vfio_irq_ctx_get(vdev, 0);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, 0);
WARN_ON_ONCE(!ctx);
if (ctx) {
vfio_virqfd_disable(&ctx->unmask);
@@ -332,7 +332,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
}
vfio_intx_set_signal(vdev, -1);
vdev->intr_ctx.irq_type = VFIO_PCI_NUM_IRQS;
- vfio_irq_ctx_free(vdev, ctx, 0);
+ vfio_irq_ctx_free(&vdev->intr_ctx, ctx, 0);
}

/*
@@ -421,7 +421,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
int irq = -EINVAL, ret;
u16 cmd;

- ctx = vfio_irq_ctx_get(vdev, vector);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, vector);

if (ctx) {
irq_bypass_unregister_producer(&ctx->producer);
@@ -432,7 +432,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
/* Interrupt stays allocated, will be freed at MSI-X disable. */
kfree(ctx->name);
eventfd_ctx_put(ctx->trigger);
- vfio_irq_ctx_free(vdev, ctx, vector);
+ vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
}

if (fd < 0)
@@ -445,7 +445,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
return irq;
}

- ctx = vfio_irq_ctx_alloc(vdev, vector);
+ ctx = vfio_irq_ctx_alloc(&vdev->intr_ctx, vector);
if (!ctx)
return -ENOMEM;

@@ -499,7 +499,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
out_free_name:
kfree(ctx->name);
out_free_ctx:
- vfio_irq_ctx_free(vdev, ctx, vector);
+ vfio_irq_ctx_free(&vdev->intr_ctx, ctx, vector);
return ret;
}

@@ -570,7 +570,8 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_intr_ctx *intr_ctx,
if (unmask)
vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
- struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
+ struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(&vdev->intr_ctx,
+ 0);
int32_t fd = *(int32_t *)data;

if (WARN_ON_ONCE(!ctx))
@@ -696,7 +697,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
return -EINVAL;

for (i = start; i < start + count; i++) {
- ctx = vfio_irq_ctx_get(vdev, i);
+ ctx = vfio_irq_ctx_get(&vdev->intr_ctx, i);
if (!ctx)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
--
2.34.1

2023-10-27 17:07:12

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 18/26] vfio/pci: Preserve per-interrupt contexts

Interrupt management for PCI passthrough devices create a new
per-interrupt context every time an interrupt is allocated, freeing
it when the interrupt is freed.

The per-interrupt context contains the properties of a particular
interrupt. Without a property that guides interrupt allocation and
free it is acceptable to always create a new per-interrupt context.

Maintain per-interrupt context across interrupt allocate and free
events in preparation for per-interrupt properties that guide
interrupt allocation and free. Examples of such properties are:
(a) whether the interrupt is emulated or not, which guides whether
the backend should indeed allocate and/or free an interrupt, (b)
an instance cookie associated with the interrupt that needs to be
provided to interrupt allocation when the interrupt is backed by IMS.

This means that existence of per-interrupt context no longer implies
a valid trigger, pointers to freed memory should be cleared, and a new
per-interrupt context cannot be assumed needing allocation when an
interrupt is allocated.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- New patch

drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 80040fde6f6b..8d84e7d62594 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -429,7 +429,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,

ctx = vfio_irq_ctx_get(intr_ctx, vector);

- if (ctx) {
+ if (ctx && ctx->trigger) {
irq_bypass_unregister_producer(&ctx->producer);
irq = pci_irq_vector(pdev, vector);
cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -437,8 +437,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
vfio_pci_memory_unlock_and_restore(vdev, cmd);
/* Interrupt stays allocated, will be freed at MSI-X disable. */
kfree(ctx->name);
+ ctx->name = NULL;
eventfd_ctx_put(ctx->trigger);
- vfio_irq_ctx_free(intr_ctx, ctx, vector);
+ ctx->trigger = NULL;
}

if (fd < 0)
@@ -451,16 +452,17 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
return irq;
}

- ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
- if (!ctx)
- return -ENOMEM;
+ /* Per-interrupt context remain allocated. */
+ if (!ctx) {
+ ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
+ if (!ctx)
+ return -ENOMEM;
+ }

ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
msix ? "x" : "", vector, pci_name(pdev));
- if (!ctx->name) {
- ret = -ENOMEM;
- goto out_free_ctx;
- }
+ if (!ctx->name)
+ return -ENOMEM;

trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger)) {
@@ -504,8 +506,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
eventfd_ctx_put(trigger);
out_free_name:
kfree(ctx->name);
-out_free_ctx:
- vfio_irq_ctx_free(intr_ctx, ctx, vector);
+ ctx->name = NULL;
return ret;
}

@@ -704,7 +705,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,

for (i = start; i < start + count; i++) {
ctx = vfio_irq_ctx_get(intr_ctx, i);
- if (!ctx)
+ if (!ctx || !ctx->trigger)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
eventfd_signal(ctx->trigger, 1);
@@ -810,6 +811,22 @@ static void _vfio_pci_init_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)

static void _vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
{
+ struct vfio_pci_irq_ctx *ctx;
+ unsigned long i;
+
+ /*
+ * Per-interrupt context remains allocated after interrupt is
+ * freed. Per-interrupt context need to be freed separately.
+ */
+ mutex_lock(&intr_ctx->igate);
+ xa_for_each(&intr_ctx->ctx, i, ctx) {
+ WARN_ON_ONCE(ctx->trigger);
+ WARN_ON_ONCE(ctx->name);
+ xa_erase(&intr_ctx->ctx, i);
+ kfree(ctx);
+ }
+ mutex_unlock(&intr_ctx->igate);
+
mutex_destroy(&intr_ctx->igate);
}

--
2.34.1

2023-10-27 17:07:13

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 20/26] vfio/pci: Separate frontend and backend code during interrupt enable/disable

vfio_msi_set_vector_signal() contains a mix of generic and
backend specific code.

Separate the backend specific code into functions that can be
replaced by backend-specific callbacks.

The dev_info() used in error message is replaced by a pr_info()
that prints the device name generated by the backend specific code
intended to be used during request_irq().

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- New patch

drivers/vfio/pci/vfio_pci_intrs.c | 110 +++++++++++++++++++-----------
1 file changed, 70 insertions(+), 40 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index fd0713dc9f81..c1f65b8adfe2 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -416,28 +416,81 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
return map.index < 0 ? map.index : map.virq;
}

-static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
- unsigned int vector, int fd,
+static void vfio_msi_free_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector)
+{
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
+ u16 cmd;
+
+ cmd = vfio_pci_memory_lock_and_enable(vdev);
+ free_irq(ctx->virq, ctx->trigger);
+ vfio_pci_memory_unlock_and_restore(vdev, cmd);
+ ctx->virq = 0;
+ /* Interrupt stays allocated, will be freed at MSI-X disable. */
+}
+
+static int vfio_msi_request_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector,
unsigned int index)
+{
+ bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
+ struct vfio_pci_core_device *vdev = intr_ctx->priv;
+ int irq, ret;
+ u16 cmd;
+
+ /* Interrupt stays allocated, will be freed at MSI-X disable. */
+ irq = vfio_msi_alloc_irq(vdev, vector, msix);
+ if (irq < 0)
+ return irq;
+
+ /*
+ * If the vector was previously allocated, refresh the on-device
+ * message data before enabling in case it had been cleared or
+ * corrupted (e.g. due to backdoor resets) since writing.
+ */
+ cmd = vfio_pci_memory_lock_and_enable(vdev);
+ if (msix) {
+ struct msi_msg msg;
+
+ get_cached_msi_msg(irq, &msg);
+ pci_write_msi_msg(irq, &msg);
+ }
+
+ ret = request_irq(irq, vfio_msihandler, 0, ctx->name, ctx->trigger);
+ vfio_pci_memory_unlock_and_restore(vdev, cmd);
+
+ ctx->virq = irq;
+
+ return ret;
+}
+
+static char *vfio_msi_device_name(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int vector,
+ unsigned int index)
{
bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
struct vfio_pci_core_device *vdev = intr_ctx->priv;
struct pci_dev *pdev = vdev->pdev;
+
+ return kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
+ msix ? "x" : "", vector, pci_name(pdev));
+}
+
+static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int vector, int fd,
+ unsigned int index)
+{
struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
- int irq = -EINVAL, ret;
- u16 cmd;
+ int ret;

ctx = vfio_irq_ctx_get(intr_ctx, vector);

if (ctx && ctx->trigger) {
irq_bypass_unregister_producer(&ctx->producer);
- irq = ctx->virq;
- cmd = vfio_pci_memory_lock_and_enable(vdev);
- free_irq(ctx->virq, ctx->trigger);
- vfio_pci_memory_unlock_and_restore(vdev, cmd);
- ctx->virq = 0;
- /* Interrupt stays allocated, will be freed at MSI-X disable. */
+ vfio_msi_free_interrupt(intr_ctx, ctx, vector);
kfree(ctx->name);
ctx->name = NULL;
eventfd_ctx_put(ctx->trigger);
@@ -447,13 +500,6 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
if (fd < 0)
return 0;

- if (irq == -EINVAL) {
- /* Interrupt stays allocated, will be freed at MSI-X disable. */
- irq = vfio_msi_alloc_irq(vdev, vector, msix);
- if (irq < 0)
- return irq;
- }
-
/* Per-interrupt context remain allocated. */
if (!ctx) {
ctx = vfio_irq_ctx_alloc(intr_ctx, vector);
@@ -461,8 +507,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
return -ENOMEM;
}

- ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
- msix ? "x" : "", vector, pci_name(pdev));
+ ctx->name = vfio_msi_device_name(intr_ctx, vector, index);
if (!ctx->name)
return -ENOMEM;

@@ -472,42 +517,27 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
goto out_free_name;
}

- /*
- * If the vector was previously allocated, refresh the on-device
- * message data before enabling in case it had been cleared or
- * corrupted (e.g. due to backdoor resets) since writing.
- */
- cmd = vfio_pci_memory_lock_and_enable(vdev);
- if (msix) {
- struct msi_msg msg;
-
- get_cached_msi_msg(irq, &msg);
- pci_write_msi_msg(irq, &msg);
- }
+ ctx->trigger = trigger;

- ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
- vfio_pci_memory_unlock_and_restore(vdev, cmd);
+ ret = vfio_msi_request_interrupt(intr_ctx, ctx, vector, index);
if (ret)
goto out_put_eventfd_ctx;

- ctx->virq = irq;
-
ctx->producer.token = trigger;
ctx->producer.irq = ctx->virq;
ret = irq_bypass_register_producer(&ctx->producer);
if (unlikely(ret)) {
- dev_info(&pdev->dev,
- "irq bypass producer (token %p) registration fails: %d\n",
- ctx->producer.token, ret);
+ pr_info("%s irq bypass producer (token %p) registration fails: %d\n",
+ ctx->name, ctx->producer.token, ret);

ctx->producer.token = NULL;
}
- ctx->trigger = trigger;

return 0;

out_put_eventfd_ctx:
- eventfd_ctx_put(trigger);
+ eventfd_ctx_put(ctx->trigger);
+ ctx->trigger = NULL;
out_free_name:
kfree(ctx->name);
ctx->name = NULL;
--
2.34.1

2023-10-27 17:07:19

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 21/26] vfio/pci: Replace backend specific calls with callbacks

The backend specific code needed to manage the interrupts are
isolated into separate functions. With the backend specific
code isolated into functions, these functions
can be turned into callbacks for other backends to use.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- New patch

drivers/vfio/pci/vfio_pci_intrs.c | 17 +++++++++++------
include/linux/vfio_pci_core.h | 15 +++++++++++++++
2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index c1f65b8adfe2..1e6376b048de 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -490,7 +490,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,

if (ctx && ctx->trigger) {
irq_bypass_unregister_producer(&ctx->producer);
- vfio_msi_free_interrupt(intr_ctx, ctx, vector);
+ intr_ctx->ops->msi_free_interrupt(intr_ctx, ctx, vector);
kfree(ctx->name);
ctx->name = NULL;
eventfd_ctx_put(ctx->trigger);
@@ -507,7 +507,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
return -ENOMEM;
}

- ctx->name = vfio_msi_device_name(intr_ctx, vector, index);
+ ctx->name = intr_ctx->ops->msi_device_name(intr_ctx, vector, index);
if (!ctx->name)
return -ENOMEM;

@@ -519,7 +519,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,

ctx->trigger = trigger;

- ret = vfio_msi_request_interrupt(intr_ctx, ctx, vector, index);
+ ret = intr_ctx->ops->msi_request_interrupt(intr_ctx, ctx, vector, index);
if (ret)
goto out_put_eventfd_ctx;

@@ -708,7 +708,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int i;

if (irq_is(intr_ctx, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
- vfio_msi_disable(intr_ctx, index);
+ intr_ctx->ops->msi_disable(intr_ctx, index);
return 0;
}

@@ -723,13 +723,13 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_intr_ctx *intr_ctx,
return vfio_msi_set_block(intr_ctx, start, count,
fds, index);

- ret = vfio_msi_enable(intr_ctx, start + count, index);
+ ret = intr_ctx->ops->msi_enable(intr_ctx, start + count, index);
if (ret)
return ret;

ret = vfio_msi_set_block(intr_ctx, start, count, fds, index);
if (ret)
- vfio_msi_disable(intr_ctx, index);
+ intr_ctx->ops->msi_disable(intr_ctx, index);

return ret;
}
@@ -872,6 +872,11 @@ static struct vfio_pci_intr_ops vfio_pci_intr_ops = {
.set_msix_trigger = vfio_pci_set_msi_trigger,
.set_err_trigger = vfio_pci_set_err_trigger,
.set_req_trigger = vfio_pci_set_req_trigger,
+ .msi_enable = vfio_msi_enable,
+ .msi_disable = vfio_msi_disable,
+ .msi_request_interrupt = vfio_msi_request_interrupt,
+ .msi_free_interrupt = vfio_msi_free_interrupt,
+ .msi_device_name = vfio_msi_device_name,
};

void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 8d2fb51a2dcc..f0951084a26f 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -69,6 +69,8 @@ struct vfio_pci_intr_ctx {
int irq_type;
};

+struct vfio_pci_irq_ctx;
+
struct vfio_pci_intr_ops {
int (*set_intx_mask)(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
@@ -91,6 +93,19 @@ struct vfio_pci_intr_ops {
int (*set_req_trigger)(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags, void *data);
+ int (*msi_enable)(struct vfio_pci_intr_ctx *intr_ctx, int nvec,
+ unsigned int index);
+ void (*msi_disable)(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int index);
+ int (*msi_request_interrupt)(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector,
+ unsigned int index);
+ void (*msi_free_interrupt)(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector);
+ char *(*msi_device_name)(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int vector, unsigned int index);
};

struct vfio_pci_core_device {
--
2.34.1

2023-10-27 17:07:38

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 25/26] vfio/pci: Add accessor for IMS index

A virtual device driver needs to facilitate translation between
the guest's MSI-X interrupt and the backing IMS interrupt with
which the physical device is programmed. For example, the
guest may need to obtain the IMS index from the virtual device driver
that it needs to program into descriptors submitted to the device
to ensure that the completion interrupts are generated correctly.

Introduce vfio_pci_ims_hwirq() to the IMS backend as a helper
that returns the IMS interrupt index backing a provided MSI-X
interrupt index belonging to a guest.

Originally-by: Dave Jiang <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---
No changes since RFC V2.

drivers/vfio/pci/vfio_pci_intrs.c | 25 +++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 1 +
2 files changed, 26 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index b318a3f671e8..32ebc8fec4c4 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -1163,6 +1163,31 @@ void vfio_pci_ims_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
}
EXPORT_SYMBOL_GPL(vfio_pci_ims_release_intr_ctx);

+/*
+ * Return IMS index of IMS interrupt backing MSI-X interrupt @vector
+ */
+int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector)
+{
+ struct vfio_pci_irq_ctx *ctx;
+ int id = -EINVAL;
+
+ mutex_lock(&intr_ctx->igate);
+
+ if (!intr_ctx->ims_backed_irq)
+ goto out_unlock;
+
+ ctx = vfio_irq_ctx_get(intr_ctx, vector);
+ if (!ctx || ctx->emulated)
+ goto out_unlock;
+
+ id = ctx->ims_id;
+
+out_unlock:
+ mutex_unlock(&intr_ctx->igate);
+ return id;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_hwirq);
+
int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index a3161af791f8..dbc77839ef26 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -180,6 +180,7 @@ void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data);
+int vfio_pci_ims_hwirq(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int start, unsigned int count);
--
2.34.1

2023-10-27 17:07:46

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 24/26] vfio/pci: Add core IMS support

Add a new interrupt management backend enabling a guest MSI-X
interrupt to be backed by an IMS interrupt on the host.

An IMS interrupt is allocated via pci_ims_alloc_irq() and requires
an implementation specific cookie that is opaque to the IMS backend.
This can be a PASID, queue ID, pointer etc. During initialization
the IMS backend learns which PCI device to operate on (and thus which
interrupt domain to allocate from) and what the default cookie should
be for any new interrupt allocation.

A virtual device driver starts by initializing the backend
using new vfio_pci_ims_init_intr_ctx(), cleanup using new
vfio_pci_ims_release_intr_ctx(). Once initialized the virtual
device driver can call vfio_pci_set_irqs_ioctl() to handle the
VFIO_DEVICE_SET_IRQS ioctl() after it has validated the parameters
to be appropriate for the particular device.

To support the IMS backend the core utilities need to be aware
which interrupt context it interacts with. New ims_backed_irq
enables this and is false for the PCI passthrough backend and
true for the IMS backend.

Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- Improve changelog.
- Refactored implementation to use new callbacks for interrupt
enable/disable and allocate/free to eliminate code duplication. (Kevin)
- Make vfio_pci_ims_intr_ops static.

drivers/vfio/pci/vfio_pci_intrs.c | 178 ++++++++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 7 ++
2 files changed, 185 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 6e34b8d8c216..b318a3f671e8 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -22,6 +22,21 @@

#include "vfio_pci_priv.h"

+/*
+ * Interrupt Message Store (IMS) private interrupt context data
+ * @vdev: Virtual device. Used for name of device in
+ * request_irq().
+ * @pdev: PCI device owning the IMS domain from where
+ * interrupts are allocated.
+ * @default_cookie: Default cookie used for IMS interrupts without unique
+ * cookie.
+ */
+struct vfio_pci_ims {
+ struct vfio_device *vdev;
+ struct pci_dev *pdev;
+ union msi_instance_cookie default_cookie;
+};
+
struct vfio_pci_irq_ctx {
bool emulated:1;
struct eventfd_ctx *trigger;
@@ -31,6 +46,8 @@ struct vfio_pci_irq_ctx {
bool masked;
struct irq_bypass_producer producer;
int virq;
+ int ims_id;
+ union msi_instance_cookie icookie;
};

static bool irq_is(struct vfio_pci_intr_ctx *intr_ctx, int type)
@@ -899,6 +916,7 @@ void vfio_pci_init_intr_ctx(struct vfio_pci_core_device *vdev,
_vfio_pci_init_intr_ctx(intr_ctx);
intr_ctx->ops = &vfio_pci_intr_ops;
intr_ctx->priv = vdev;
+ intr_ctx->ims_backed_irq = false;
}
EXPORT_SYMBOL_GPL(vfio_pci_init_intr_ctx);

@@ -985,6 +1003,166 @@ int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
}
EXPORT_SYMBOL_GPL(vfio_pci_set_emulated);

+/* Guest MSI-X interrupts backed by IMS host interrupts */
+
+/*
+ * Free the IMS interrupt associated with @ctx.
+ *
+ * For an IMS interrupt the interrupt is freed from the underlying
+ * PCI device's IMS domain.
+ */
+static void vfio_pci_ims_irq_free(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx)
+{
+ struct vfio_pci_ims *ims = intr_ctx->priv;
+ struct msi_map irq_map = {};
+
+ irq_map.index = ctx->ims_id;
+ irq_map.virq = ctx->virq;
+ pci_ims_free_irq(ims->pdev, irq_map);
+ ctx->ims_id = -EINVAL;
+ ctx->virq = 0;
+}
+
+/*
+ * Allocate a host IMS interrupt for @ctx.
+ *
+ * For an IMS interrupt the interrupt is allocated from the underlying
+ * PCI device's IMS domain.
+ */
+static int vfio_pci_ims_irq_alloc(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx)
+{
+ struct vfio_pci_ims *ims = intr_ctx->priv;
+ struct msi_map irq_map = {};
+
+ irq_map = pci_ims_alloc_irq(ims->pdev, &ctx->icookie, NULL);
+ if (irq_map.index < 0)
+ return irq_map.index;
+
+ ctx->ims_id = irq_map.index;
+ ctx->virq = irq_map.virq;
+
+ return 0;
+}
+
+static void vfio_ims_free_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector)
+{
+ free_irq(ctx->virq, ctx->trigger);
+ vfio_pci_ims_irq_free(intr_ctx, ctx);
+}
+
+static int vfio_ims_request_interrupt(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector,
+ unsigned int index)
+{
+ int ret;
+
+ ret = vfio_pci_ims_irq_alloc(intr_ctx, ctx);
+ if (ret < 0)
+ return ret;
+
+ ret = request_irq(ctx->virq, vfio_msihandler, 0, ctx->name,
+ ctx->trigger);
+ if (ret < 0) {
+ vfio_pci_ims_irq_free(intr_ctx, ctx);
+ return ret;
+ }
+
+ return 0;
+}
+
+static char *vfio_ims_device_name(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int vector,
+ unsigned int index)
+{
+ struct vfio_pci_ims *ims = intr_ctx->priv;
+ struct device *dev = &ims->vdev->device;
+
+ return kasprintf(GFP_KERNEL, "vfio-ims[%d](%s)", vector, dev_name(dev));
+}
+
+static void vfio_ims_disable(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int index)
+{
+ struct vfio_pci_irq_ctx *ctx;
+ unsigned long i;
+
+ xa_for_each(&intr_ctx->ctx, i, ctx)
+ vfio_msi_set_vector_signal(intr_ctx, i, -1, index);
+}
+
+/*
+ * The virtual device driver is responsible for enabling IMS by creating
+ * the IMS domaim from where interrupts will be allocated dynamically.
+ * IMS thus has to be enabled by the time an ioctl() arrives.
+ */
+static int vfio_ims_enable(struct vfio_pci_intr_ctx *intr_ctx, int nvec,
+ unsigned int index)
+{
+ return -EINVAL;
+}
+
+static int vfio_ims_init_irq_ctx(struct vfio_pci_intr_ctx *intr_ctx,
+ struct vfio_pci_irq_ctx *ctx)
+{
+ struct vfio_pci_ims *ims = intr_ctx->priv;
+
+ ctx->icookie = ims->default_cookie;
+
+ return 0;
+}
+
+static struct vfio_pci_intr_ops vfio_pci_ims_intr_ops = {
+ .set_msix_trigger = vfio_pci_set_msi_trigger,
+ .set_req_trigger = vfio_pci_set_req_trigger,
+ .msi_enable = vfio_ims_enable,
+ .msi_disable = vfio_ims_disable,
+ .msi_request_interrupt = vfio_ims_request_interrupt,
+ .msi_free_interrupt = vfio_ims_free_interrupt,
+ .msi_device_name = vfio_ims_device_name,
+ .init_irq_ctx = vfio_ims_init_irq_ctx,
+};
+
+int vfio_pci_ims_init_intr_ctx(struct vfio_device *vdev,
+ struct vfio_pci_intr_ctx *intr_ctx,
+ struct pci_dev *pdev,
+ union msi_instance_cookie *default_cookie)
+{
+ struct vfio_pci_ims *ims;
+
+ ims = kzalloc(sizeof(*ims), GFP_KERNEL_ACCOUNT);
+ if (!ims)
+ return -ENOMEM;
+
+ ims->pdev = pdev;
+ ims->default_cookie = *default_cookie;
+ ims->vdev = vdev;
+
+ _vfio_pci_init_intr_ctx(intr_ctx);
+
+ intr_ctx->ops = &vfio_pci_ims_intr_ops;
+ intr_ctx->priv = ims;
+ intr_ctx->ims_backed_irq = true;
+ intr_ctx->irq_type = VFIO_PCI_MSIX_IRQ_INDEX;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_init_intr_ctx);
+
+void vfio_pci_ims_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
+{
+ struct vfio_pci_ims *ims = intr_ctx->priv;
+
+ _vfio_pci_release_intr_ctx(intr_ctx);
+ kfree(ims);
+ intr_ctx->irq_type = VFIO_PCI_NUM_IRQS;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_ims_release_intr_ctx);
+
int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 4fe0df25162f..a3161af791f8 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -58,6 +58,7 @@ struct vfio_pci_region {
* @req_trigger: Eventfd associated with device request notification
* @ctx: Per-interrupt context indexed by vector
* @irq_type: Type of interrupt from guest perspective
+ * @ims_backed_irq: Interrupts managed by IMS backend
*/
struct vfio_pci_intr_ctx {
const struct vfio_pci_intr_ops *ops;
@@ -67,6 +68,7 @@ struct vfio_pci_intr_ctx {
struct eventfd_ctx *req_trigger;
struct xarray ctx;
int irq_type;
+ bool ims_backed_irq:1;
};

struct vfio_pci_irq_ctx;
@@ -181,6 +183,11 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
unsigned int start, unsigned int count);
+int vfio_pci_ims_init_intr_ctx(struct vfio_device *vdev,
+ struct vfio_pci_intr_ctx *intr_ctx,
+ struct pci_dev *pdev,
+ union msi_instance_cookie *default_cookie);
+void vfio_pci_ims_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz);
ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
--
2.34.1

2023-10-27 17:07:58

by Reinette Chatre

[permalink] [raw]
Subject: [RFC PATCH V3 23/26] vfio/pci: Support emulated interrupts

Access from a guest to a virtual device may be either 'direct-path',
where the guest interacts directly with the underlying hardware,
or 'intercepted path' where the virtual device emulates operations.

Support emulated interrupts that can be used to handle 'intercepted
path' operations. For example, a virtual device may use 'intercepted
path' for configuration. Doing so, configuration requests intercepted
by the virtual device driver are handled within the virtual device
driver with completion signaled to the guest without interacting with
the underlying hardware.

Add vfio_pci_set_emulated() and vfio_pci_send_signal() to the
VFIO PCI API. vfio_pci_set_emulated() configures a range of interrupts
to be emulated.

Any range of interrupts can be configured as emulated as long as no
interrupt has previously been allocated at that vector. The virtual
device driver uses vfio_pci_send_signal() to trigger interrupts in
the guest.

Originally-by: Dave Jiang <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
---
Changes since RFC V2:
- Remove the backend "supports_emulated" flag. All backends now support
emulated interrupts.
- Move emulated interrupt enabling from IMS backend to frontend.

drivers/vfio/pci/vfio_pci_intrs.c | 87 ++++++++++++++++++++++++++++++-
include/linux/vfio_pci_core.h | 3 ++
2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8c86f2d6229f..6e34b8d8c216 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -23,6 +23,7 @@
#include "vfio_pci_priv.h"

struct vfio_pci_irq_ctx {
+ bool emulated:1;
struct eventfd_ctx *trigger;
struct virqfd *unmask;
struct virqfd *mask;
@@ -497,8 +498,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,
ctx = vfio_irq_ctx_get(intr_ctx, vector);

if (ctx && ctx->trigger) {
- irq_bypass_unregister_producer(&ctx->producer);
- intr_ctx->ops->msi_free_interrupt(intr_ctx, ctx, vector);
+ if (!ctx->emulated) {
+ irq_bypass_unregister_producer(&ctx->producer);
+ intr_ctx->ops->msi_free_interrupt(intr_ctx, ctx, vector);
+ }
kfree(ctx->name);
ctx->name = NULL;
eventfd_ctx_put(ctx->trigger);
@@ -527,6 +530,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_intr_ctx *intr_ctx,

ctx->trigger = trigger;

+ if (ctx->emulated)
+ return 0;
+
ret = intr_ctx->ops->msi_request_interrupt(intr_ctx, ctx, vector, index);
if (ret)
goto out_put_eventfd_ctx;
@@ -902,6 +908,83 @@ void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx)
}
EXPORT_SYMBOL_GPL(vfio_pci_release_intr_ctx);

+/*
+ * vfio_pci_send_signal() - Send signal to the eventfd.
+ * @intr_ctx: Interrupt context.
+ * @vector: Vector for which interrupt will be signaled.
+ *
+ * Trigger signal to guest for emulated interrupts.
+ */
+void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector)
+{
+ struct vfio_pci_irq_ctx *ctx;
+
+ mutex_lock(&intr_ctx->igate);
+
+ ctx = vfio_irq_ctx_get(intr_ctx, vector);
+
+ if (WARN_ON_ONCE(!ctx || !ctx->emulated || !ctx->trigger))
+ goto out_unlock;
+
+ eventfd_signal(ctx->trigger, 1);
+
+out_unlock:
+ mutex_unlock(&intr_ctx->igate);
+}
+EXPORT_SYMBOL_GPL(vfio_pci_send_signal);
+
+/*
+ * vfio_pci_set_emulated() - Set range of interrupts that will be emulated.
+ * @intr_ctx: Interrupt context.
+ * @start: First emulated interrupt vector.
+ * @count: Number of emulated interrupts starting from @start.
+ *
+ * Emulated interrupts will not be backed by hardware interrupts but
+ * instead triggered by virtual device driver.
+ *
+ * Return: error code on failure (-EBUSY if the vector is not available,
+ * -ENOMEM on allocation failure), 0 on success. No partial success, on
+ * success entire range was set as emulated, on failure no interrupt in
+ * range was set as emulated.
+ */
+int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int start, unsigned int count)
+{
+ struct vfio_pci_irq_ctx *ctx;
+ unsigned long i, j;
+ int ret = -EINVAL;
+
+ mutex_lock(&intr_ctx->igate);
+
+ for (i = start; i < start + count; i++) {
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+ if (!ctx) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+ ctx->emulated = true;
+ ret = xa_insert(&intr_ctx->ctx, i, ctx, GFP_KERNEL_ACCOUNT);
+ if (ret) {
+ kfree(ctx);
+ goto out_err;
+ }
+ }
+
+ mutex_unlock(&intr_ctx->igate);
+ return 0;
+
+out_err:
+ for (j = start; j < i; j++) {
+ ctx = vfio_irq_ctx_get(intr_ctx, j);
+ vfio_irq_ctx_free(intr_ctx, ctx, j);
+ }
+
+ mutex_unlock(&intr_ctx->igate);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_set_emulated);
+
int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index d5140a732741..4fe0df25162f 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -178,6 +178,9 @@ void vfio_pci_release_intr_ctx(struct vfio_pci_intr_ctx *intr_ctx);
int vfio_pci_set_irqs_ioctl(struct vfio_pci_intr_ctx *intr_ctx, uint32_t flags,
unsigned int index, unsigned int start,
unsigned int count, void *data);
+void vfio_pci_send_signal(struct vfio_pci_intr_ctx *intr_ctx, unsigned int vector);
+int vfio_pci_set_emulated(struct vfio_pci_intr_ctx *intr_ctx,
+ unsigned int start, unsigned int count);
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz);
ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
--
2.34.1

2023-10-31 07:31:46

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

> From: Chatre, Reinette <[email protected]>
> Sent: Saturday, October 28, 2023 1:01 AM
>
> Changes since RFC V2:
> - RFC V2:
> https://lore.kernel.org/lkml/[email protected]
> /
> - Still submiting this as RFC series. I believe that this now matches the
> expectatations raised during earlier reviews. If you agree this is
> the right direction then I can drop the RFC prefix on next submission.
> If you do not agree then please do let me know where I missed
> expectations.

Overall this matches my expectation. Let's wait for Alex/Jason's thoughts
before moving to next-level refinement.

btw as commented to last version, if this is the agreed direction probably
next version can be split into two parts: part1 contains the new framework
and converts intel vgpu driver to use it, then part2 for ims specific logic.

this way part1 can be verified and merged as a integral part. ????

2023-11-01 18:09:16

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

On Tue, 31 Oct 2023 07:31:24 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Chatre, Reinette <[email protected]>
> > Sent: Saturday, October 28, 2023 1:01 AM
> >
> > Changes since RFC V2:
> > - RFC V2:
> > https://lore.kernel.org/lkml/[email protected]
> > /
> > - Still submiting this as RFC series. I believe that this now matches the
> > expectatations raised during earlier reviews. If you agree this is
> > the right direction then I can drop the RFC prefix on next submission.
> > If you do not agree then please do let me know where I missed
> > expectations.
>
> Overall this matches my expectation. Let's wait for Alex/Jason's thoughts
> before moving to next-level refinement.

It feels like there's a lot of gratuitous change without any clear
purpose. We create an ops structure so that a variant/mdev driver can
make use of the vfio-pci-core set_irqs ioctl piecemeal, but then the
two entry points that are actually implemented by the ims version are
the same as the core version, so the ops appear to be at the wrong
level. The use of the priv pointer for the core callbacks looks like
it's just trying to justify the existence of the opaque pointer, it
should really just be using container_of(). We drill down into various
support functions for MSI (ie. enable, disable, request_interrupt,
free_interrupt, device name), but INTx is largely ignored, where we
haven't even kept is_intx() consistent with the other helpers.

Without an in-tree user of this code, we're just chopping up code for
no real purpose. There's no reason that a variant driver requiring IMS
couldn't initially implement their own SET_IRQS ioctl. Doing that
might lead to a more organic solution where we create interfaces where
they're actually needed. The existing mdev sample drivers should also
be considered in any schemes to refactor the core code into a generic
SET_IRQS helper for devices exposing a vfio-pci API. Thanks,

Alex

2023-11-02 02:52:18

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

> From: Alex Williamson <[email protected]>
> Sent: Thursday, November 2, 2023 2:07 AM
>
> On Tue, 31 Oct 2023 07:31:24 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Chatre, Reinette <[email protected]>
> > > Sent: Saturday, October 28, 2023 1:01 AM
> > >
> > > Changes since RFC V2:
> > > - RFC V2:
> > >
> https://lore.kernel.org/lkml/[email protected]
> > > /
> > > - Still submiting this as RFC series. I believe that this now matches the
> > > expectatations raised during earlier reviews. If you agree this is
> > > the right direction then I can drop the RFC prefix on next submission.
> > > If you do not agree then please do let me know where I missed
> > > expectations.
> >
> > Overall this matches my expectation. Let's wait for Alex/Jason's thoughts
> > before moving to next-level refinement.
>
> It feels like there's a lot of gratuitous change without any clear
> purpose. We create an ops structure so that a variant/mdev driver can
> make use of the vfio-pci-core set_irqs ioctl piecemeal, but then the
> two entry points that are actually implemented by the ims version are
> the same as the core version, so the ops appear to be at the wrong
> level. The use of the priv pointer for the core callbacks looks like
> it's just trying to justify the existence of the opaque pointer, it
> should really just be using container_of(). We drill down into various
> support functions for MSI (ie. enable, disable, request_interrupt,
> free_interrupt, device name), but INTx is largely ignored, where we
> haven't even kept is_intx() consistent with the other helpers.

All above are good points. The main interest of this series is to share
MSI frontend interface with various backends (PCI MSI/X, IMS, and
purely emulated). From this angle the current ops abstraction does
sound to sit in a wrong level. But if counting your suggestion to also
refactor mdev sample driver (e.g. mtty emulates INTx) then there
might be a different outcome.

>
> Without an in-tree user of this code, we're just chopping up code for
> no real purpose. There's no reason that a variant driver requiring IMS
> couldn't initially implement their own SET_IRQS ioctl. Doing that

this is an interesting idea. We haven't seen a real usage which wants
such MSI emulation on IMS for variant drivers. but if the code is
simple enough to demonstrate the 1st user of IMS it might not be
a bad choice. There are additional trap-emulation required in the
device MMIO bar (mostly copying MSI permission entry which contains
PASID info to the corresponding IMS entry). At a glance that area
is 4k-aligned so should be doable.

let's explore more into this option.

> might lead to a more organic solution where we create interfaces where
> they're actually needed. The existing mdev sample drivers should also
> be considered in any schemes to refactor the core code into a generic
> SET_IRQS helper for devices exposing a vfio-pci API. Thanks,
>

In this case we'll have mtty to demonstrate an emulated INTx backend
and intel vgpu to contain an emulated MSI backend.

and moving forward all drivers with a vfio-pci API should share a same
frontend interface.

2023-11-02 03:15:44

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

> From: Tian, Kevin
> Sent: Thursday, November 2, 2023 10:52 AM
>
> >
> > Without an in-tree user of this code, we're just chopping up code for
> > no real purpose. There's no reason that a variant driver requiring IMS
> > couldn't initially implement their own SET_IRQS ioctl. Doing that
>
> this is an interesting idea. We haven't seen a real usage which wants
> such MSI emulation on IMS for variant drivers. but if the code is
> simple enough to demonstrate the 1st user of IMS it might not be
> a bad choice. There are additional trap-emulation required in the
> device MMIO bar (mostly copying MSI permission entry which contains
> PASID info to the corresponding IMS entry). At a glance that area
> is 4k-aligned so should be doable.
>

misread the spec. the MSI-X permission table which provides
auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
4k page together with many other registers. emulation of them
could be simple with a native read/write handler but not sure
whether any of them may sit in a hot path to affect perf due to
trap...

2023-11-02 21:14:55

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

On Thu, 2 Nov 2023 03:14:09 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Tian, Kevin
> > Sent: Thursday, November 2, 2023 10:52 AM
> >
> > >
> > > Without an in-tree user of this code, we're just chopping up code for
> > > no real purpose. There's no reason that a variant driver requiring IMS
> > > couldn't initially implement their own SET_IRQS ioctl. Doing that
> >
> > this is an interesting idea. We haven't seen a real usage which wants
> > such MSI emulation on IMS for variant drivers. but if the code is
> > simple enough to demonstrate the 1st user of IMS it might not be
> > a bad choice. There are additional trap-emulation required in the
> > device MMIO bar (mostly copying MSI permission entry which contains
> > PASID info to the corresponding IMS entry). At a glance that area
> > is 4k-aligned so should be doable.
> >
>
> misread the spec. the MSI-X permission table which provides
> auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> 4k page together with many other registers. emulation of them
> could be simple with a native read/write handler but not sure
> whether any of them may sit in a hot path to affect perf due to
> trap...

I'm not sure if you're referring to a specific device spec or the PCI
spec, but the PCI spec has long included an implementation note
suggesting alignment of the MSI-X vector table and pba and separation
from CSRs, and I see this is now even more strongly worded in the 6.0
spec.

Note though that for QEMU, these are emulated in the VMM and not
written through to the device. The result of writes to the vector
table in the VMM are translated to vector use/unuse operations, which
we see at the kernel level through SET_IRQS ioctl calls. Are you
expecting to get PASID information written by the guest through the
emulated vector table? That would entail something more than a simple
IMS backend to MSI-X frontend. Thanks,

Alex

2023-11-03 07:23:27

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

> From: Alex Williamson <[email protected]>
> Sent: Friday, November 3, 2023 5:14 AM
>
> On Thu, 2 Nov 2023 03:14:09 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Tian, Kevin
> > > Sent: Thursday, November 2, 2023 10:52 AM
> > >
> > > >
> > > > Without an in-tree user of this code, we're just chopping up code for
> > > > no real purpose. There's no reason that a variant driver requiring IMS
> > > > couldn't initially implement their own SET_IRQS ioctl. Doing that
> > >
> > > this is an interesting idea. We haven't seen a real usage which wants
> > > such MSI emulation on IMS for variant drivers. but if the code is
> > > simple enough to demonstrate the 1st user of IMS it might not be
> > > a bad choice. There are additional trap-emulation required in the
> > > device MMIO bar (mostly copying MSI permission entry which contains
> > > PASID info to the corresponding IMS entry). At a glance that area
> > > is 4k-aligned so should be doable.
> > >
> >
> > misread the spec. the MSI-X permission table which provides
> > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > 4k page together with many other registers. emulation of them
> > could be simple with a native read/write handler but not sure
> > whether any of them may sit in a hot path to affect perf due to
> > trap...
>
> I'm not sure if you're referring to a specific device spec or the PCI
> spec, but the PCI spec has long included an implementation note
> suggesting alignment of the MSI-X vector table and pba and separation
> from CSRs, and I see this is now even more strongly worded in the 6.0
> spec.
>
> Note though that for QEMU, these are emulated in the VMM and not
> written through to the device. The result of writes to the vector
> table in the VMM are translated to vector use/unuse operations, which
> we see at the kernel level through SET_IRQS ioctl calls. Are you
> expecting to get PASID information written by the guest through the
> emulated vector table? That would entail something more than a simple
> IMS backend to MSI-X frontend. Thanks,
>

I was referring to IDXD device spec. Basically it allows a process to
submit a descriptor which contains a completion interrupt handle.
The handle is the index of a MSI-X entry or IMS entry allocated by
the idxd driver. To mark the association between application and
related handles the driver records the PASID of the application
in an auxiliary structure for MSI-X (called MSI-X permission table)
or directly in the IMS entry. This additional info includes whether
an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
value to be checked against the descriptor.

As you said virtualizing MSI-X table itself is via SET_IRQS and it's
4k aligned. Then we also need to capture guest updates to the MSI-X
permission table and copy the PASID information into the
corresponding IMS entry when using the IMS backend. It's MSI-X
permission table not 4k aligned then trapping it will affect adjacent
registers.

My quick check in idxd spec doesn't reveal an real impact in perf
critical path. Most registers are configuration/control registers
accessed at driver init time and a few interrupt registers related
to errors or administrative purpose.

2023-11-03 15:52:24

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

On Fri, 3 Nov 2023 07:23:13 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson <[email protected]>
> > Sent: Friday, November 3, 2023 5:14 AM
> >
> > On Thu, 2 Nov 2023 03:14:09 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Tian, Kevin
> > > > Sent: Thursday, November 2, 2023 10:52 AM
> > > >
> > > > >
> > > > > Without an in-tree user of this code, we're just chopping up code for
> > > > > no real purpose. There's no reason that a variant driver requiring IMS
> > > > > couldn't initially implement their own SET_IRQS ioctl. Doing that
> > > >
> > > > this is an interesting idea. We haven't seen a real usage which wants
> > > > such MSI emulation on IMS for variant drivers. but if the code is
> > > > simple enough to demonstrate the 1st user of IMS it might not be
> > > > a bad choice. There are additional trap-emulation required in the
> > > > device MMIO bar (mostly copying MSI permission entry which contains
> > > > PASID info to the corresponding IMS entry). At a glance that area
> > > > is 4k-aligned so should be doable.
> > > >
> > >
> > > misread the spec. the MSI-X permission table which provides
> > > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > > 4k page together with many other registers. emulation of them
> > > could be simple with a native read/write handler but not sure
> > > whether any of them may sit in a hot path to affect perf due to
> > > trap...
> >
> > I'm not sure if you're referring to a specific device spec or the PCI
> > spec, but the PCI spec has long included an implementation note
> > suggesting alignment of the MSI-X vector table and pba and separation
> > from CSRs, and I see this is now even more strongly worded in the 6.0
> > spec.
> >
> > Note though that for QEMU, these are emulated in the VMM and not
> > written through to the device. The result of writes to the vector
> > table in the VMM are translated to vector use/unuse operations, which
> > we see at the kernel level through SET_IRQS ioctl calls. Are you
> > expecting to get PASID information written by the guest through the
> > emulated vector table? That would entail something more than a simple
> > IMS backend to MSI-X frontend. Thanks,
> >
>
> I was referring to IDXD device spec. Basically it allows a process to
> submit a descriptor which contains a completion interrupt handle.
> The handle is the index of a MSI-X entry or IMS entry allocated by
> the idxd driver. To mark the association between application and
> related handles the driver records the PASID of the application
> in an auxiliary structure for MSI-X (called MSI-X permission table)
> or directly in the IMS entry. This additional info includes whether
> an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
> value to be checked against the descriptor.
>
> As you said virtualizing MSI-X table itself is via SET_IRQS and it's
> 4k aligned. Then we also need to capture guest updates to the MSI-X
> permission table and copy the PASID information into the
> corresponding IMS entry when using the IMS backend. It's MSI-X
> permission table not 4k aligned then trapping it will affect adjacent
> registers.
>
> My quick check in idxd spec doesn't reveal an real impact in perf
> critical path. Most registers are configuration/control registers
> accessed at driver init time and a few interrupt registers related
> to errors or administrative purpose.

Right, it looks like you'll need to trap writes to the MSI-X
Permissions Table via a sparse mmap capability to avoid assumptions
whether it lives on the same page as the MSI-X vector table or PBA.
Ideally the hardware folks have considered this to avoid any conflict
with latency sensitive registers.

The variant driver would use this for collecting the meta data relative
to the IMS interrupt, but this is all tangential to whether we
preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver
implements its own.

And just to be clear, I don't expect the iDXD variant driver to go to
extraordinary lengths to duplicate the core ioctl, we can certainly
refactor and export things where it makes sense, but I think it likely
makes more sense for the variant driver to implement the shell of the
ioctl rather than trying to multiplex the entire core ioctl with an ops
structure that's so intimately tied to the core implementation and
focused only on the MSI-X code paths. Thanks,

Alex

2023-11-07 08:29:34

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

> From: Alex Williamson <[email protected]>
> Sent: Friday, November 3, 2023 11:51 PM
>
> On Fri, 3 Nov 2023 07:23:13 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson <[email protected]>
> > > Sent: Friday, November 3, 2023 5:14 AM
> > >
> > > On Thu, 2 Nov 2023 03:14:09 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > From: Tian, Kevin
> > > > > Sent: Thursday, November 2, 2023 10:52 AM
> > > > >
> > > > > >
> > > > > > Without an in-tree user of this code, we're just chopping up code for
> > > > > > no real purpose. There's no reason that a variant driver requiring
> IMS
> > > > > > couldn't initially implement their own SET_IRQS ioctl. Doing that
> > > > >
> > > > > this is an interesting idea. We haven't seen a real usage which wants
> > > > > such MSI emulation on IMS for variant drivers. but if the code is
> > > > > simple enough to demonstrate the 1st user of IMS it might not be
> > > > > a bad choice. There are additional trap-emulation required in the
> > > > > device MMIO bar (mostly copying MSI permission entry which
> contains
> > > > > PASID info to the corresponding IMS entry). At a glance that area
> > > > > is 4k-aligned so should be doable.
> > > > >
> > > >
> > > > misread the spec. the MSI-X permission table which provides
> > > > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > > > 4k page together with many other registers. emulation of them
> > > > could be simple with a native read/write handler but not sure
> > > > whether any of them may sit in a hot path to affect perf due to
> > > > trap...
> > >
> > > I'm not sure if you're referring to a specific device spec or the PCI
> > > spec, but the PCI spec has long included an implementation note
> > > suggesting alignment of the MSI-X vector table and pba and separation
> > > from CSRs, and I see this is now even more strongly worded in the 6.0
> > > spec.
> > >
> > > Note though that for QEMU, these are emulated in the VMM and not
> > > written through to the device. The result of writes to the vector
> > > table in the VMM are translated to vector use/unuse operations, which
> > > we see at the kernel level through SET_IRQS ioctl calls. Are you
> > > expecting to get PASID information written by the guest through the
> > > emulated vector table? That would entail something more than a simple
> > > IMS backend to MSI-X frontend. Thanks,
> > >
> >
> > I was referring to IDXD device spec. Basically it allows a process to
> > submit a descriptor which contains a completion interrupt handle.
> > The handle is the index of a MSI-X entry or IMS entry allocated by
> > the idxd driver. To mark the association between application and
> > related handles the driver records the PASID of the application
> > in an auxiliary structure for MSI-X (called MSI-X permission table)
> > or directly in the IMS entry. This additional info includes whether
> > an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
> > value to be checked against the descriptor.
> >
> > As you said virtualizing MSI-X table itself is via SET_IRQS and it's
> > 4k aligned. Then we also need to capture guest updates to the MSI-X
> > permission table and copy the PASID information into the
> > corresponding IMS entry when using the IMS backend. It's MSI-X
> > permission table not 4k aligned then trapping it will affect adjacent
> > registers.
> >
> > My quick check in idxd spec doesn't reveal an real impact in perf
> > critical path. Most registers are configuration/control registers
> > accessed at driver init time and a few interrupt registers related
> > to errors or administrative purpose.
>
> Right, it looks like you'll need to trap writes to the MSI-X
> Permissions Table via a sparse mmap capability to avoid assumptions
> whether it lives on the same page as the MSI-X vector table or PBA.
> Ideally the hardware folks have considered this to avoid any conflict
> with latency sensitive registers.
>
> The variant driver would use this for collecting the meta data relative
> to the IMS interrupt, but this is all tangential to whether we
> preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver
> implements its own.

Agree

>
> And just to be clear, I don't expect the iDXD variant driver to go to
> extraordinary lengths to duplicate the core ioctl, we can certainly
> refactor and export things where it makes sense, but I think it likely
> makes more sense for the variant driver to implement the shell of the
> ioctl rather than trying to multiplex the entire core ioctl with an ops
> structure that's so intimately tied to the core implementation and
> focused only on the MSI-X code paths. Thanks,
>

btw I'll let Reinette to decide whether she wants to implement such
a variant driver or waits until idxd siov driver is ready to demonstrate
the usage. One concern with the variant driver approach is lacking
of a real-world usage (e.g. what IMS brings when guest only wants
MSI-X on an assigned PF). Though it may provide a shorter path
to enable the IMS backend support, also comes with the long-term
maintenance burden.