2024-02-02 05:00:10

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 00/17] vfio/pci: Remove duplicate code and logic from VFIO PCI interrupt management

Hi Everybody,

Duplicate VFIO PCI interrupt management code and logic can be found
in two areas:
1) the VFIO PCI core (vfio_pci_core.c) directly accesses the
VFIO PCI interrupt lock and data to duplicate actions available
via the VFIO PCI interrupt management code (vfio_pci_intrs.c), and
2) the INTx and MSI/MSI-X interrupt management code within
vfio_pci_intrs.c have duplicate logic.

This series addresses (1) and (2) by first containing VFIO PCI interrupt
management within the VFIO PCI interrupt management code found in
vfio_pci_intrs.c and then refactoring the VFIO PCI interrupt management
code (vfio_pci_intrs.c) to share common logic between INTx, MSI, and
MSI-X interrupt management.

Bulk of the changes result in the same actions as before this series.
Two functional changes to highlight:
a) "vfio/pci: Consistently acquire mutex for interrupt management"
changes vfio_pci_core_disable()->vfio_pci_set_irqs_ioctl() to be
called with the igate mutex held.
b) "vfio/pci: Preserve per-interrupt contexts" changes MSI/MSI-x
interrupt management to preserve per-interrupt contexts across interrupt
allocate/free.

This work was inspired by (and inherits a few patches from) the IMS [1]
enabling work but this work is submitted (and expected to be considered)
independent from IMS.

Any feedback will be appreciated.

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/

Reinette Chatre (17):
vfio/pci: Use unsigned int instead of unsigned
vfio/pci: Remove duplicate check from
vfio_pci_set_ctx_trigger_single() wrappers
vfio/pci: Consistently acquire mutex for interrupt management
vfio/pci: Remove duplicate interrupt management from core VFIO PCI
vfio/pci: Limit eventfd signaling to interrupt management code
vfio/pci: Remove interrupt index interpretation from wrappers
vfio/pci: Preserve per-interrupt contexts
vfio/pci: Extract MSI/MSI-X specific code from common flow
vfio/pci: Converge similar code flows
vfio/pci: Extract INTx specific code from vfio_intx_set_signal()
vfio/pci: Perform MSI/MSI-X interrupt management via callbacks
vfio/pci: Remove msi term from generic code
vfio/pci: Remove vfio_intx_set_signal()
vfio/pci: Add utility to trigger INTx eventfd knowing interrupt
context
vfio/pci: Let enable and disable of interrupt types use same signature
vfio/pci: Move vfio_msi_disable() to be with other MSI/MSI-X
management code
vfio/pci: Remove duplicate interrupt management flow

drivers/vfio/pci/vfio_pci_core.c | 49 ++-
drivers/vfio/pci/vfio_pci_intrs.c | 510 ++++++++++++++++++------------
2 files changed, 318 insertions(+), 241 deletions(-)

base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3
--
2.34.1



2024-02-02 05:00:47

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 08/17] vfio/pci: Extract MSI/MSI-X specific code from common flow

vfio_intx_set_signal() and vfio_msi_set_signal() have similar code
flow mixed with actions specific to the interrupt type.

Code that is similar between MSI, MSI-X, and INTx management can
be shared instead of duplicated. Start by replacing the MSI/MSI-X
specific code within vfio_msi_set_signal() with functions. These
functions that are specific to the management of MSI/MSI-X can
later be called from a shared flow.

Signed-off-by: Reinette Chatre <[email protected]>
---
Note to maintainers:
This is similar to "vfio/pci: Separate frontend and backend code during
interrupt enable/disable" that was submitted as part of IMS changes,
but is not specific to IMS.
https://lore.kernel.org/lkml/[email protected]

drivers/vfio/pci/vfio_pci_intrs.c | 134 ++++++++++++++++++++----------
1 file changed, 89 insertions(+), 45 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 7ca2b983b66e..70f2382c9c0c 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -414,26 +414,99 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
return map.index < 0 ? map.index : map.virq;
}

+static void vfio_msi_free_interrupt(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ int irq;
+ u16 cmd;
+
+ irq = pci_irq_vector(pdev, vector);
+ cmd = vfio_pci_memory_lock_and_enable(vdev);
+ free_irq(irq, ctx->trigger);
+ vfio_pci_memory_unlock_and_restore(vdev, cmd);
+ /*Interrupt stays allocated, will be freed at MSI/MSI-X disable. */
+}
+
+static int vfio_msi_request_interrupt(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector, unsigned int index)
+{
+ bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
+ int irq, ret;
+ u16 cmd;
+
+ irq = vfio_msi_alloc_irq(vdev, vector, msix);
+ if (irq < 0)
+ return irq;
+
+ /*
+ * If the vector was previously allocated, refresh the on-device
+ * message data before enabling in case it had been cleared or
+ * corrupted (e.g. due to backdoor resets) since writing.
+ */
+ cmd = vfio_pci_memory_lock_and_enable(vdev);
+ if (msix) {
+ struct msi_msg msg;
+
+ get_cached_msi_msg(irq, &msg);
+ pci_write_msi_msg(irq, &msg);
+ }
+
+ ret = request_irq(irq, vfio_msihandler, 0, ctx->name, ctx->trigger);
+ vfio_pci_memory_unlock_and_restore(vdev, cmd);
+
+ return ret;
+}
+
+static char *vfio_msi_device_name(struct vfio_pci_core_device *vdev,
+ unsigned int vector, unsigned int index)
+{
+ struct pci_dev *pdev = vdev->pdev;
+
+ return kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
+ index == VFIO_PCI_MSIX_IRQ_INDEX ? "x" : "",
+ vector, pci_name(pdev));
+}
+
+static void vfio_msi_register_producer(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ int ret;
+
+ ctx->producer.token = ctx->trigger;
+ ctx->producer.irq = pci_irq_vector(pdev, vector);
+ ret = irq_bypass_register_producer(&ctx->producer);
+ if (unlikely(ret)) {
+ dev_info(&pdev->dev,
+ "irq bypass producer (token %p) registration fails: %d\n",
+ ctx->producer.token, ret);
+ ctx->producer.token = NULL;
+ ctx->producer.irq = 0;
+ }
+}
+
+static void vfio_msi_unregister_producer(struct vfio_pci_irq_ctx *ctx)
+{
+ irq_bypass_unregister_producer(&ctx->producer);
+}
+
static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
unsigned int vector, int fd,
unsigned int index)
{
- bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
- struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
- int irq = -EINVAL, ret;
- u16 cmd;
+ int ret;

ctx = vfio_irq_ctx_get(vdev, vector);

if (ctx && ctx->trigger) {
- irq_bypass_unregister_producer(&ctx->producer);
- irq = pci_irq_vector(pdev, vector);
- cmd = vfio_pci_memory_lock_and_enable(vdev);
- free_irq(irq, ctx->trigger);
- vfio_pci_memory_unlock_and_restore(vdev, cmd);
- /* Interrupt stays allocated, will be freed at MSI-X disable. */
+ vfio_msi_unregister_producer(ctx);
+ vfio_msi_free_interrupt(vdev, ctx, vector);
kfree(ctx->name);
ctx->name = NULL;
eventfd_ctx_put(ctx->trigger);
@@ -443,13 +516,6 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
if (fd < 0)
return 0;

- if (irq == -EINVAL) {
- /* Interrupt stays allocated, will be freed at MSI-X disable. */
- irq = vfio_msi_alloc_irq(vdev, vector, msix);
- if (irq < 0)
- return irq;
- }
-
/* Per-interrupt context remain allocated. */
if (!ctx) {
ctx = vfio_irq_ctx_alloc(vdev, vector);
@@ -457,8 +523,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
return -ENOMEM;
}

- ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
- msix ? "x" : "", vector, pci_name(pdev));
+ ctx->name = vfio_msi_device_name(vdev, vector, index);
if (!ctx->name)
return -ENOMEM;

@@ -468,40 +533,19 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
goto out_free_name;
}

- /*
- * If the vector was previously allocated, refresh the on-device
- * message data before enabling in case it had been cleared or
- * corrupted (e.g. due to backdoor resets) since writing.
- */
- cmd = vfio_pci_memory_lock_and_enable(vdev);
- if (msix) {
- struct msi_msg msg;
-
- get_cached_msi_msg(irq, &msg);
- pci_write_msi_msg(irq, &msg);
- }
+ ctx->trigger = trigger;

- ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
- vfio_pci_memory_unlock_and_restore(vdev, cmd);
+ ret = vfio_msi_request_interrupt(vdev, ctx, vector, index);
if (ret)
goto out_put_eventfd_ctx;

- ctx->producer.token = trigger;
- ctx->producer.irq = irq;
- ret = irq_bypass_register_producer(&ctx->producer);
- if (unlikely(ret)) {
- dev_info(&pdev->dev,
- "irq bypass producer (token %p) registration fails: %d\n",
- ctx->producer.token, ret);
-
- ctx->producer.token = NULL;
- }
- ctx->trigger = trigger;
+ vfio_msi_register_producer(vdev, ctx, vector);

return 0;

out_put_eventfd_ctx:
- eventfd_ctx_put(trigger);
+ eventfd_ctx_put(ctx->trigger);
+ ctx->trigger = NULL;
out_free_name:
kfree(ctx->name);
ctx->name = NULL;
--
2.34.1


2024-02-02 05:00:51

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 04/17] vfio/pci: Remove duplicate interrupt management from core VFIO PCI

The eventfds associated with device request notification and
error IRQ are managed by the VFIO PCI interrupt management code
contained in vfio_pci_intrs.c. The VFIO PCI interrupt management
code supports acquiring, releasing, as well as signaling of
these eventfd.

The VFIO PCI core code duplicates the signaling of device request
notification and error interrupts and does so by acquiring the lock
and accessing data that should ideally be contained within the
VFIO PCI interrupt management code.

Do not duplicate but instead call existing VFIO PCI interrupt
management code to signal device request notification
and error interrupts.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d2847ca2f0cb..de097174e455 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -700,16 +700,12 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
#endif
vfio_pci_core_disable(vdev);

- mutex_lock(&vdev->igate);
- if (vdev->err_trigger) {
- eventfd_ctx_put(vdev->err_trigger);
- vdev->err_trigger = NULL;
- }
- if (vdev->req_trigger) {
- eventfd_ctx_put(vdev->req_trigger);
- vdev->req_trigger = NULL;
- }
- mutex_unlock(&vdev->igate);
+ vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+ VFIO_IRQ_SET_ACTION_TRIGGER,
+ VFIO_PCI_ERR_IRQ_INDEX, 0, 0, NULL);
+ vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+ VFIO_IRQ_SET_ACTION_TRIGGER,
+ VFIO_PCI_REQ_IRQ_INDEX, 0, 0, NULL);
}
EXPORT_SYMBOL_GPL(vfio_pci_core_close_device);

--
2.34.1


2024-02-02 05:01:00

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 10/17] vfio/pci: Extract INTx specific code from vfio_intx_set_signal()

vfio_intx_set_signal() and vfio_msi_set_vector_signal() use the
same code flow for INTx, MSI, and MSI-X interrupt management.

Extract the INTx specific code from vfio_intx_set_signal() to
leave behind the same flow as vfio_msi_set_vector_signal(),
ready for sharing.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 69 ++++++++++++++++++++++---------
1 file changed, 49 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 98b099f58b2b..6eef4e2d7c13 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -260,15 +260,58 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
return 0;
}

+static void vfio_intx_free_interrupt(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector)
+{
+ struct pci_dev *pdev = vdev->pdev;
+
+ free_irq(pdev->irq, vdev);
+}
+
+static int vfio_intx_request_interrupt(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector, unsigned int index)
+{
+ unsigned long irqflags = IRQF_SHARED;
+ struct pci_dev *pdev = vdev->pdev;
+ unsigned long flags;
+ int ret;
+
+ if (!vdev->pci_2_3)
+ irqflags = 0;
+
+ ret = request_irq(pdev->irq, vfio_intx_handler, irqflags,
+ ctx->name, vdev);
+ if (ret)
+ return ret;
+
+ /*
+ * INTx disable will stick across the new irq setup,
+ * disable_irq won't.
+ */
+ spin_lock_irqsave(&vdev->irqlock, flags);
+ if (!vdev->pci_2_3 && ctx->masked)
+ disable_irq_nosync(pdev->irq);
+ spin_unlock_irqrestore(&vdev->irqlock, flags);
+
+ return 0;
+}
+
+static char *vfio_intx_device_name(struct vfio_pci_core_device *vdev,
+ unsigned int vector, unsigned int index)
+{
+ struct pci_dev *pdev = vdev->pdev;
+
+ return kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev));
+}
+
static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
unsigned int vector, int fd,
unsigned int index)
{
- struct pci_dev *pdev = vdev->pdev;
- unsigned long irqflags = IRQF_SHARED;
struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
- unsigned long flags;
int ret;

ctx = vfio_irq_ctx_get(vdev, 0);
@@ -276,7 +319,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
return -EINVAL;

if (ctx->trigger) {
- free_irq(pdev->irq, vdev);
+ vfio_intx_free_interrupt(vdev, ctx, vector);
kfree(ctx->name);
eventfd_ctx_put(ctx->trigger);
ctx->trigger = NULL;
@@ -285,8 +328,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
if (fd < 0) /* Disable only */
return 0;

- ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
- pci_name(pdev));
+ ctx->name = vfio_intx_device_name(vdev, vector, index);
if (!ctx->name)
return -ENOMEM;

@@ -298,11 +340,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,

ctx->trigger = trigger;

- if (!vdev->pci_2_3)
- irqflags = 0;
-
- ret = request_irq(pdev->irq, vfio_intx_handler,
- irqflags, ctx->name, vdev);
+ ret = vfio_intx_request_interrupt(vdev, ctx, vector, index);
if (ret) {
ctx->trigger = NULL;
kfree(ctx->name);
@@ -310,15 +348,6 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
return ret;
}

- /*
- * INTx disable will stick across the new irq setup,
- * disable_irq won't.
- */
- spin_lock_irqsave(&vdev->irqlock, flags);
- if (!vdev->pci_2_3 && ctx->masked)
- disable_irq_nosync(pdev->irq);
- spin_unlock_irqrestore(&vdev->irqlock, flags);
-
return 0;
}

--
2.34.1


2024-02-02 05:01:02

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 05/17] vfio/pci: Limit eventfd signaling to interrupt management code

VFIO PCI interrupt management has full support for configuring,
signaling, and clearing the eventfds associated with device request
notification and error interrupts.

Core VFIO PCI duplicates eventfd signaling while also accessing
data that should be contained within the interrupt management code.

Remove the duplicate eventfd signaling from core VFIO PCI
and rely on VFIO PCI interrupt management for eventfd signaling.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 30 +++++++++++++-----------------
1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index de097174e455..a8235b9c1a81 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1868,21 +1868,20 @@ void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count)
struct vfio_pci_core_device *vdev =
container_of(core_vdev, struct vfio_pci_core_device, vdev);
struct pci_dev *pdev = vdev->pdev;
+ int ret;

- mutex_lock(&vdev->igate);
+ ret = vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+ VFIO_IRQ_SET_ACTION_TRIGGER,
+ VFIO_PCI_REQ_IRQ_INDEX, 0, 1, NULL);

- if (vdev->req_trigger) {
- if (!(count % 10))
- pci_notice_ratelimited(pdev,
- "Relaying device request to user (#%u)\n",
- count);
- eventfd_signal(vdev->req_trigger);
- } else if (count == 0) {
+ if (!ret && !(count % 10)) {
+ pci_notice_ratelimited(pdev,
+ "Relaying device request to user (#%u)\n",
+ count);
+ } else if (ret && count == 0) {
pci_warn(pdev,
- "No device request channel registered, blocked until released by user\n");
+ "No device request channel registered, blocked until released by user\n");
}
-
- mutex_unlock(&vdev->igate);
}
EXPORT_SYMBOL_GPL(vfio_pci_core_request);

@@ -2292,12 +2291,9 @@ pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
{
struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);

- mutex_lock(&vdev->igate);
-
- if (vdev->err_trigger)
- eventfd_signal(vdev->err_trigger);
-
- mutex_unlock(&vdev->igate);
+ vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
+ VFIO_IRQ_SET_ACTION_TRIGGER,
+ VFIO_PCI_ERR_IRQ_INDEX, 0, 1, NULL);

return PCI_ERS_RESULT_CAN_RECOVER;
}
--
2.34.1


2024-02-02 05:01:22

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 12/17] vfio/pci: Remove msi term from generic code

vfio_msi_set_vector_signal() and by extension vfio_msi_set_block()
are no longer specific to MSI and MSI-X interrupts.

Change the name of these functions in preparation for them
to be used for management of INTx interrupts.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 07dc388c4513..7f9dc81cb97f 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -557,7 +557,7 @@ static struct vfio_pci_intr_ops intr_ops[] = {
},
};

-static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
+static int vfio_irq_set_vector_signal(struct vfio_pci_core_device *vdev,
unsigned int vector, int fd,
unsigned int index)
{
@@ -617,7 +617,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
return ret;
}

-static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
+static int vfio_irq_set_block(struct vfio_pci_core_device *vdev,
unsigned int start, unsigned int count,
int32_t *fds, unsigned int index)
{
@@ -626,12 +626,12 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,

for (i = 0, j = start; i < count && !ret; i++, j++) {
int fd = fds ? fds[i] : -1;
- ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
+ ret = vfio_irq_set_vector_signal(vdev, j, fd, index);
}

if (ret) {
for (i = start; i < j; i++)
- vfio_msi_set_vector_signal(vdev, i, -1, index);
+ vfio_irq_set_vector_signal(vdev, i, -1, index);
}

return ret;
@@ -648,7 +648,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
xa_for_each(&vdev->ctx, i, ctx) {
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
- vfio_msi_set_vector_signal(vdev, i, -1, index);
+ vfio_irq_set_vector_signal(vdev, i, -1, index);
vfio_irq_ctx_free(vdev, ctx, i);
}

@@ -786,14 +786,14 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
int ret;

if (vdev->irq_type == index)
- return vfio_msi_set_block(vdev, start, count,
+ return vfio_irq_set_block(vdev, start, count,
fds, index);

ret = vfio_msi_enable(vdev, start + count, index);
if (ret)
return ret;

- ret = vfio_msi_set_block(vdev, start, count, fds, index);
+ ret = vfio_irq_set_block(vdev, start, count, fds, index);
if (ret)
vfio_msi_disable(vdev, index);

--
2.34.1


2024-02-02 05:01:39

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 06/17] vfio/pci: Remove interrupt index interpretation from wrappers

vfio_pci_set_intx_trigger() and vfio_pci_set_msi_trigger() have similar
enough flows that can be converged into one shared flow instead of
duplicated.

To share code between management of interrupt types it is necessary that
the type of the interrupt is only interpreted by the code specific to
the interrupt type being managed.

Remove interrupt type interpretation from what will eventually be
shared code (vfio_pci_set_msi_trigger(), vfio_msi_set_block()) by
pushing this interpretation to be within the interrupt
type specific code (vfio_msi_enable(),
(temporary) vfio_msi_set_vector_signal()).

Signed-off-by: Reinette Chatre <[email protected]>
---
Note to maintainers:
Originally formed part of the IMS submission below, but is not
specific to IMS.
https://lore.kernel.org/lkml/[email protected]

drivers/vfio/pci/vfio_pci_intrs.c | 38 +++++++++++++++++--------------
1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 97a3bb22b186..31f73c70fcd2 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -346,16 +346,19 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
return IRQ_HANDLED;
}

-static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msix)
+static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec,
+ unsigned int index)
{
struct pci_dev *pdev = vdev->pdev;
- unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
+ unsigned int flag;
int ret;
u16 cmd;

if (!is_irq_none(vdev))
return -EINVAL;

+ flag = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
+
/* return the number of supported vectors if we can't get all: */
cmd = vfio_pci_memory_lock_and_enable(vdev);
ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
@@ -367,10 +370,9 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
}
vfio_pci_memory_unlock_and_restore(vdev, cmd);

- vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
- VFIO_PCI_MSI_IRQ_INDEX;
+ vdev->irq_type = index;

- if (!msix) {
+ if (index == VFIO_PCI_MSI_IRQ_INDEX) {
/*
* Compute the virtual hardware field for max msi vectors -
* it is the log base 2 of the number of vectors.
@@ -413,8 +415,10 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
}

static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
- unsigned int vector, int fd, bool msix)
+ unsigned int vector, int fd,
+ unsigned int index)
{
+ bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
@@ -505,25 +509,26 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,

static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
unsigned int start, unsigned int count,
- int32_t *fds, bool msix)
+ int32_t *fds, unsigned int index)
{
unsigned int i, j;
int ret = 0;

for (i = 0, j = start; i < count && !ret; i++, j++) {
int fd = fds ? fds[i] : -1;
- ret = vfio_msi_set_vector_signal(vdev, j, fd, msix);
+ ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
}

if (ret) {
for (i = start; i < j; i++)
- vfio_msi_set_vector_signal(vdev, i, -1, msix);
+ vfio_msi_set_vector_signal(vdev, i, -1, index);
}

return ret;
}

-static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
+static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
+ unsigned int index)
{
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
@@ -533,7 +538,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
xa_for_each(&vdev->ctx, i, ctx) {
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
- vfio_msi_set_vector_signal(vdev, i, -1, msix);
+ vfio_msi_set_vector_signal(vdev, i, -1, index);
}

cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -656,10 +661,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
{
struct vfio_pci_irq_ctx *ctx;
unsigned int i;
- bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;

if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
- vfio_msi_disable(vdev, msix);
+ vfio_msi_disable(vdev, index);
return 0;
}

@@ -672,15 +676,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,

if (vdev->irq_type == index)
return vfio_msi_set_block(vdev, start, count,
- fds, msix);
+ fds, index);

- ret = vfio_msi_enable(vdev, start + count, msix);
+ ret = vfio_msi_enable(vdev, start + count, index);
if (ret)
return ret;

- ret = vfio_msi_set_block(vdev, start, count, fds, msix);
+ ret = vfio_msi_set_block(vdev, start, count, fds, index);
if (ret)
- vfio_msi_disable(vdev, msix);
+ vfio_msi_disable(vdev, index);

return ret;
}
--
2.34.1


2024-02-02 05:01:49

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 13/17] vfio/pci: Remove vfio_intx_set_signal()

The interrupt management flow of vfio_intx_set_signal() is available
from the now generic vfio_irq_set_signal().

Initialize the INTx specific management ops so that vfio_irq_set_signal()
can be used to manage INTx interrupts and point all existing
INTx specific interrupt management calls to vfio_irq_set_signal().

Use vfio_irq_set_block() within vfio_pci_set_intx_trigger() to
highlight its similarities with vfio_pci_set_msi_trigger() for the
next stage of uniting the interrupt management code.
vfio_pci_set_intx_trigger() ensures that start == 0 and count == 1
before vfio_irq_set_block() is called so the loop within it is
essentially a direct call to vfio_irq_set_vector_signal().

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 62 +++++++------------------------
1 file changed, 13 insertions(+), 49 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 7f9dc81cb97f..d7c2cd739d74 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -46,6 +46,10 @@ struct vfio_pci_intr_ops {
void (*unregister_producer)(struct vfio_pci_irq_ctx *ctx);
};

+static int vfio_irq_set_vector_signal(struct vfio_pci_core_device *vdev,
+ unsigned int vector, int fd,
+ unsigned int index);
+
static bool irq_is(struct vfio_pci_core_device *vdev, int type)
{
return vdev->irq_type == type;
@@ -321,51 +325,6 @@ static char *vfio_intx_device_name(struct vfio_pci_core_device *vdev,
return kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev));
}

-static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
- unsigned int vector, int fd,
- unsigned int index)
-{
- struct vfio_pci_irq_ctx *ctx;
- struct eventfd_ctx *trigger;
- int ret;
-
- ctx = vfio_irq_ctx_get(vdev, 0);
- if (WARN_ON_ONCE(!ctx))
- return -EINVAL;
-
- if (ctx->trigger) {
- vfio_intx_free_interrupt(vdev, ctx, vector);
- kfree(ctx->name);
- eventfd_ctx_put(ctx->trigger);
- ctx->trigger = NULL;
- }
-
- if (fd < 0) /* Disable only */
- return 0;
-
- ctx->name = vfio_intx_device_name(vdev, vector, index);
- if (!ctx->name)
- return -ENOMEM;
-
- trigger = eventfd_ctx_fdget(fd);
- if (IS_ERR(trigger)) {
- kfree(ctx->name);
- return PTR_ERR(trigger);
- }
-
- ctx->trigger = trigger;
-
- ret = vfio_intx_request_interrupt(vdev, ctx, vector, index);
- if (ret) {
- ctx->trigger = NULL;
- kfree(ctx->name);
- eventfd_ctx_put(trigger);
- return ret;
- }
-
- return 0;
-}
-
static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
{
struct vfio_pci_irq_ctx *ctx;
@@ -376,7 +335,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
}
- vfio_intx_set_signal(vdev, 0, -1, VFIO_PCI_INTX_IRQ_INDEX);
+ vfio_irq_set_vector_signal(vdev, 0, -1, VFIO_PCI_INTX_IRQ_INDEX);
vdev->irq_type = VFIO_PCI_NUM_IRQS;
vfio_irq_ctx_free(vdev, ctx, 0);
}
@@ -541,6 +500,11 @@ static void vfio_msi_unregister_producer(struct vfio_pci_irq_ctx *ctx)
}

static struct vfio_pci_intr_ops intr_ops[] = {
+ [VFIO_PCI_INTX_IRQ_INDEX] = {
+ .request_interrupt = vfio_intx_request_interrupt,
+ .free_interrupt = vfio_intx_free_interrupt,
+ .device_name = vfio_intx_device_name,
+ },
[VFIO_PCI_MSI_IRQ_INDEX] = {
.request_interrupt = vfio_msi_request_interrupt,
.free_interrupt = vfio_msi_free_interrupt,
@@ -735,17 +699,17 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
return -EINVAL;

if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
- int32_t fd = *(int32_t *)data;
+ int32_t *fds = data;
int ret;

if (is_intx(vdev))
- return vfio_intx_set_signal(vdev, start, fd, index);
+ return vfio_irq_set_block(vdev, start, count, fds, index);

ret = vfio_intx_enable(vdev);
if (ret)
return ret;

- ret = vfio_intx_set_signal(vdev, start, fd, index);
+ ret = vfio_irq_set_block(vdev, start, count, fds, index);
if (ret)
vfio_intx_disable(vdev);

--
2.34.1


2024-02-02 05:01:51

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 07/17] vfio/pci: Preserve per-interrupt contexts

MSI and MSI-X interrupt management for PCI passthrough devices create
a new per-interrupt context every time an interrupt is allocated,
freeing it when the interrupt is freed.

The per-interrupt context contains the properties of a particular
interrupt. Without a property that persists across interrupt allocation
and free it is acceptable to always create a new per-interrupt context.

INTx interrupt context has a "masked" property that persists across
allocation and free and thus preserves its interrupt context
across interrupt allocation and free calls.

MSI and MSI-X interrupts already remain allocated across interrupt
allocation and free requests, additionally maintaining the
individual interrupt context is a reflection of this existing
behavior and matches INTx behavior so that more code can be shared.

An additional benefit is that maintaining interrupt context supports
a potential future use case of emulated interrupts, where the
"is this interrupt emulated" is a property that needs to persist
across allocation and free requests.

Persistent interrupt contexts means that existence of per-interrupt
context no longer implies a valid trigger, pointers to freed memory
should be cleared, and a new per-interrupt context cannot be assumed
needing allocation when an interrupt is allocated.

Signed-off-by: Reinette Chatre <[email protected]>
---
Note to maintainers:
This addition originally formed part of the IMS work below that mostly
ignored INTx. This work focuses on INTx, MSI, MSI-X where this addition
is relevant.
https://lore.kernel.org/lkml/[email protected]

drivers/vfio/pci/vfio_pci_intrs.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 31f73c70fcd2..7ca2b983b66e 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -427,7 +427,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,

ctx = vfio_irq_ctx_get(vdev, vector);

- if (ctx) {
+ if (ctx && ctx->trigger) {
irq_bypass_unregister_producer(&ctx->producer);
irq = pci_irq_vector(pdev, vector);
cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -435,8 +435,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
vfio_pci_memory_unlock_and_restore(vdev, cmd);
/* Interrupt stays allocated, will be freed at MSI-X disable. */
kfree(ctx->name);
+ ctx->name = NULL;
eventfd_ctx_put(ctx->trigger);
- vfio_irq_ctx_free(vdev, ctx, vector);
+ ctx->trigger = NULL;
}

if (fd < 0)
@@ -449,16 +450,17 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
return irq;
}

- ctx = vfio_irq_ctx_alloc(vdev, vector);
- if (!ctx)
- return -ENOMEM;
+ /* Per-interrupt context remain allocated. */
+ if (!ctx) {
+ ctx = vfio_irq_ctx_alloc(vdev, vector);
+ if (!ctx)
+ return -ENOMEM;
+ }

ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
msix ? "x" : "", vector, pci_name(pdev));
- if (!ctx->name) {
- ret = -ENOMEM;
- goto out_free_ctx;
- }
+ if (!ctx->name)
+ return -ENOMEM;

trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger)) {
@@ -502,8 +504,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
eventfd_ctx_put(trigger);
out_free_name:
kfree(ctx->name);
-out_free_ctx:
- vfio_irq_ctx_free(vdev, ctx, vector);
+ ctx->name = NULL;
return ret;
}

@@ -539,6 +540,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
vfio_msi_set_vector_signal(vdev, i, -1, index);
+ vfio_irq_ctx_free(vdev, ctx, i);
}

cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -694,7 +696,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,

for (i = start; i < start + count; i++) {
ctx = vfio_irq_ctx_get(vdev, i);
- if (!ctx)
+ if (!ctx || !ctx->trigger)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
eventfd_signal(ctx->trigger);
--
2.34.1


2024-02-02 05:02:14

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

vfio_pci_set_intx_trigger() and vfio_pci_set_msi_trigger() have
flows that can be shared.

For INTx, MSI, and MSI-X interrupt management to share the
same enable/disable flow the interrupt specific enable and
disable functions should have the same signatures.

Let vfio_intx_enable() and vfio_msi_enable() use the same
parameters by passing "start" and "count" to these functions
instead of letting the (what will eventually be) common code
interpret these values.

Providing "start" and "count" to vfio_intx_enable()
enables the INTx specific check of these parameters to move into
the INTx specific vfio_intx_enable(). Similarly, providing "start"
and "count" to vfio_msi_enable() enables the MSI/MSI-X specific
code to initialize number of vectors needed.

The shared MSI/MSI-X code needs the interrupt index. Provide
the interrupt index (clearly marked as unused) to the INTx code
to use the same signatures.

With interrupt type specific code using the same parameters it
is possible to have common code that calls the enable/disable
code for different interrupt types.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 37065623d286..9217fea3f636 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -257,13 +257,18 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
return ret;
}

-static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
+static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
+ unsigned int start, unsigned int count,
+ unsigned int __always_unused index)
{
struct vfio_pci_irq_ctx *ctx;

if (!is_irq_none(vdev))
return -EINVAL;

+ if (start != 0 || count != 1)
+ return -EINVAL;
+
if (!vdev->pdev->irq)
return -ENODEV;

@@ -332,7 +337,8 @@ static char *vfio_intx_device_name(struct vfio_pci_core_device *vdev,
return kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev));
}

-static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
+static void vfio_intx_disable(struct vfio_pci_core_device *vdev,
+ unsigned int __always_unused index)
{
struct vfio_pci_irq_ctx *ctx;

@@ -358,17 +364,20 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
return IRQ_HANDLED;
}

-static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec,
+static int vfio_msi_enable(struct vfio_pci_core_device *vdev,
+ unsigned int start, unsigned int count,
unsigned int index)
{
struct pci_dev *pdev = vdev->pdev;
unsigned int flag;
- int ret;
+ int ret, nvec;
u16 cmd;

if (!is_irq_none(vdev))
return -EINVAL;

+ nvec = start + count;
+
flag = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? PCI_IRQ_MSIX : PCI_IRQ_MSI;

/* return the number of supported vectors if we can't get all: */
@@ -701,11 +710,11 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
unsigned int i;

if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
- vfio_intx_disable(vdev);
+ vfio_intx_disable(vdev, index);
return 0;
}

- if (!(is_intx(vdev) || is_irq_none(vdev)) || start != 0 || count != 1)
+ if (!(is_intx(vdev) || is_irq_none(vdev)))
return -EINVAL;

if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
@@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
if (is_intx(vdev))
return vfio_irq_set_block(vdev, start, count, fds, index);

- ret = vfio_intx_enable(vdev);
+ ret = vfio_intx_enable(vdev, start, count, index);
if (ret)
return ret;

ret = vfio_irq_set_block(vdev, start, count, fds, index);
if (ret)
- vfio_intx_disable(vdev);
+ vfio_intx_disable(vdev, index);

return ret;
}
@@ -771,7 +780,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
return vfio_irq_set_block(vdev, start, count,
fds, index);

- ret = vfio_msi_enable(vdev, start + count, index);
+ ret = vfio_msi_enable(vdev, start, count, index);
if (ret)
return ret;

--
2.34.1


2024-02-02 05:02:38

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 11/17] vfio/pci: Perform MSI/MSI-X interrupt management via callbacks

vfio_msi_set_vector_signal() is specific to MSI and MSI-X interrupt
management but its flow is the same as vfio_intx_set_signal() that
manages the INTx interrupts.

Replace the MSI and MSI-X specific calls with callbacks in preparation
for vfio_msi_set_vector_signal() to manage INTx interrupts also.

In preparation for support of INTx only the IRQ bypass code is
made optional.

Signed-off-by: Reinette Chatre <[email protected]>
---
Note to maintainers:
This change resembles "vfio/pci: Replace backend specific calls with
callbacks" that formed part of the IMS submission, but is not
specific to IMS.
https://lore.kernel.org/lkml/[email protected]

drivers/vfio/pci/vfio_pci_intrs.c | 44 +++++++++++++++++++++++++++----
1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 6eef4e2d7c13..07dc388c4513 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -31,6 +31,21 @@ struct vfio_pci_irq_ctx {
struct irq_bypass_producer producer;
};

+struct vfio_pci_intr_ops {
+ int (*request_interrupt)(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector, unsigned int index);
+ void (*free_interrupt)(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector);
+ char *(*device_name)(struct vfio_pci_core_device *vdev,
+ unsigned int vector, unsigned int index);
+ void (*register_producer)(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx,
+ unsigned int vector);
+ void (*unregister_producer)(struct vfio_pci_irq_ctx *ctx);
+};
+
static bool irq_is(struct vfio_pci_core_device *vdev, int type)
{
return vdev->irq_type == type;
@@ -525,6 +540,23 @@ static void vfio_msi_unregister_producer(struct vfio_pci_irq_ctx *ctx)
irq_bypass_unregister_producer(&ctx->producer);
}

+static struct vfio_pci_intr_ops intr_ops[] = {
+ [VFIO_PCI_MSI_IRQ_INDEX] = {
+ .request_interrupt = vfio_msi_request_interrupt,
+ .free_interrupt = vfio_msi_free_interrupt,
+ .device_name = vfio_msi_device_name,
+ .register_producer = vfio_msi_register_producer,
+ .unregister_producer = vfio_msi_unregister_producer,
+ },
+ [VFIO_PCI_MSIX_IRQ_INDEX] = {
+ .request_interrupt = vfio_msi_request_interrupt,
+ .free_interrupt = vfio_msi_free_interrupt,
+ .device_name = vfio_msi_device_name,
+ .register_producer = vfio_msi_register_producer,
+ .unregister_producer = vfio_msi_unregister_producer,
+ },
+};
+
static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
unsigned int vector, int fd,
unsigned int index)
@@ -536,8 +568,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
ctx = vfio_irq_ctx_get(vdev, vector);

if (ctx && ctx->trigger) {
- vfio_msi_unregister_producer(ctx);
- vfio_msi_free_interrupt(vdev, ctx, vector);
+ if (intr_ops[index].unregister_producer)
+ intr_ops[index].unregister_producer(ctx);
+ intr_ops[index].free_interrupt(vdev, ctx, vector);
kfree(ctx->name);
ctx->name = NULL;
eventfd_ctx_put(ctx->trigger);
@@ -554,7 +587,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
return -ENOMEM;
}

- ctx->name = vfio_msi_device_name(vdev, vector, index);
+ ctx->name = intr_ops[index].device_name(vdev, vector, index);
if (!ctx->name)
return -ENOMEM;

@@ -566,11 +599,12 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,

ctx->trigger = trigger;

- ret = vfio_msi_request_interrupt(vdev, ctx, vector, index);
+ ret = intr_ops[index].request_interrupt(vdev, ctx, vector, index);
if (ret)
goto out_put_eventfd_ctx;

- vfio_msi_register_producer(vdev, ctx, vector);
+ if (intr_ops[index].register_producer)
+ intr_ops[index].register_producer(vdev, ctx, vector);

return 0;

--
2.34.1


2024-02-02 05:02:45

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 17/17] vfio/pci: Remove duplicate interrupt management flow

vfio_pci_set_intx_trigger() and vfio_pci_set_trigger() have the
same flow that calls interrupt type (INTx, MSI, MSI-X) specific
functions.

Create callbacks for the interrupt type specific code that
can be called by the shared code so that only one of these functions
are needed. Rename the final generic function shared by all
interrupt types vfio_pci_set_trigger().

Relocate the "IOCTL support" marker to correctly mark the
now generic code.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 104 ++++++++++--------------------
1 file changed, 35 insertions(+), 69 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index daa84a317f40..a5b337cfae60 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -32,6 +32,12 @@ struct vfio_pci_irq_ctx {
};

struct vfio_pci_intr_ops {
+ int (*enable)(struct vfio_pci_core_device *vdev, unsigned int start,
+ unsigned int count, unsigned int index);
+ void (*disable)(struct vfio_pci_core_device *vdev,
+ unsigned int index);
+ void (*send_eventfd)(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx);
int (*request_interrupt)(struct vfio_pci_core_device *vdev,
struct vfio_pci_irq_ctx *ctx,
unsigned int vector, unsigned int index);
@@ -356,6 +362,12 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev,
/*
* MSI/MSI-X
*/
+static void vfio_send_msi_eventfd(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx)
+{
+ eventfd_signal(ctx->trigger);
+}
+
static irqreturn_t vfio_msihandler(int irq, void *arg)
{
struct eventfd_ctx *trigger = arg;
@@ -544,13 +556,22 @@ static void vfio_msi_unregister_producer(struct vfio_pci_irq_ctx *ctx)
irq_bypass_unregister_producer(&ctx->producer);
}

+/*
+ * IOCTL support
+ */
static struct vfio_pci_intr_ops intr_ops[] = {
[VFIO_PCI_INTX_IRQ_INDEX] = {
+ .enable = vfio_intx_enable,
+ .disable = vfio_intx_disable,
+ .send_eventfd = vfio_send_intx_eventfd_ctx,
.request_interrupt = vfio_intx_request_interrupt,
.free_interrupt = vfio_intx_free_interrupt,
.device_name = vfio_intx_device_name,
},
[VFIO_PCI_MSI_IRQ_INDEX] = {
+ .enable = vfio_msi_enable,
+ .disable = vfio_msi_disable,
+ .send_eventfd = vfio_send_msi_eventfd,
.request_interrupt = vfio_msi_request_interrupt,
.free_interrupt = vfio_msi_free_interrupt,
.device_name = vfio_msi_device_name,
@@ -558,6 +579,9 @@ static struct vfio_pci_intr_ops intr_ops[] = {
.unregister_producer = vfio_msi_unregister_producer,
},
[VFIO_PCI_MSIX_IRQ_INDEX] = {
+ .enable = vfio_msi_enable,
+ .disable = vfio_msi_disable,
+ .send_eventfd = vfio_send_msi_eventfd,
.request_interrupt = vfio_msi_request_interrupt,
.free_interrupt = vfio_msi_free_interrupt,
.device_name = vfio_msi_device_name,
@@ -646,9 +670,6 @@ static int vfio_irq_set_block(struct vfio_pci_core_device *vdev,
return ret;
}

-/*
- * IOCTL support
- */
static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
unsigned int index, unsigned int start,
unsigned int count, uint32_t flags,
@@ -701,71 +722,16 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
return 0;
}

-static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
- unsigned int index, unsigned int start,
- unsigned int count, uint32_t flags,
- void *data)
-{
- struct vfio_pci_irq_ctx *ctx;
- unsigned int i;
-
- if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
- vfio_intx_disable(vdev, index);
- return 0;
- }
-
- if (!(is_intx(vdev) || is_irq_none(vdev)))
- return -EINVAL;
-
- if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
- int32_t *fds = data;
- int ret;
-
- if (is_intx(vdev))
- return vfio_irq_set_block(vdev, start, count, fds, index);
-
- ret = vfio_intx_enable(vdev, start, count, index);
- if (ret)
- return ret;
-
- ret = vfio_irq_set_block(vdev, start, count, fds, index);
- if (ret)
- vfio_intx_disable(vdev, index);
-
- return ret;
- }
-
- if (!is_intx(vdev))
- return -EINVAL;
-
- /* temporary */
- for (i = start; i < start + count; i++) {
- ctx = vfio_irq_ctx_get(vdev, i);
- if (!ctx || !ctx->trigger)
- continue;
- if (flags & VFIO_IRQ_SET_DATA_NONE) {
- vfio_send_intx_eventfd_ctx(vdev, ctx);
- } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
- uint8_t *bools = data;
-
- if (bools[i - start])
- vfio_send_intx_eventfd_ctx(vdev, ctx);
- }
- }
-
- return 0;
-}
-
-static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
- unsigned int index, unsigned int start,
- unsigned int count, uint32_t flags,
- void *data)
+static int vfio_pci_set_trigger(struct vfio_pci_core_device *vdev,
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
+ void *data)
{
struct vfio_pci_irq_ctx *ctx;
unsigned int i;

if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
- vfio_msi_disable(vdev, index);
+ intr_ops[index].disable(vdev, index);
return 0;
}

@@ -780,13 +746,13 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
return vfio_irq_set_block(vdev, start, count,
fds, index);

- ret = vfio_msi_enable(vdev, start, count, index);
+ ret = intr_ops[index].enable(vdev, start, count, index);
if (ret)
return ret;

ret = vfio_irq_set_block(vdev, start, count, fds, index);
if (ret)
- vfio_msi_disable(vdev, index);
+ intr_ops[index].disable(vdev, index);

return ret;
}
@@ -799,11 +765,11 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
if (!ctx || !ctx->trigger)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
- eventfd_signal(ctx->trigger);
+ intr_ops[index].send_eventfd(vdev, ctx);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
uint8_t *bools = data;
if (bools[i - start])
- eventfd_signal(ctx->trigger);
+ intr_ops[index].send_eventfd(vdev, ctx);
}
}
return 0;
@@ -912,7 +878,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
func = vfio_pci_set_intx_unmask;
break;
case VFIO_IRQ_SET_ACTION_TRIGGER:
- func = vfio_pci_set_intx_trigger;
+ func = vfio_pci_set_trigger;
break;
}
break;
@@ -924,7 +890,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
/* XXX Need masking support exported */
break;
case VFIO_IRQ_SET_ACTION_TRIGGER:
- func = vfio_pci_set_msi_trigger;
+ func = vfio_pci_set_trigger;
break;
}
break;
--
2.34.1


2024-02-02 05:03:14

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 14/17] vfio/pci: Add utility to trigger INTx eventfd knowing interrupt context

vfio_send_intx_eventfd() is available to signal the eventfd associated
with the INTx interrupt. It does so by first obtaining the vector's
interrupt context and then signaling the eventfd. The interrupt context
may already be available before vfio_send_intx_eventfd() is called,
making the additional query unnecessary.

Introduce vfio_send_intx_eventfd_ctx() that can be used to signal the
eventfd associated with the INTx interrupt when the interrupt context
is already known and use it instead of vfio_send_intx_eventfd() in the
one instance where the interrupt context is already known.

Replace usage of vfio_send_intx_eventfd() within
vfio_pci_set_intx_trigger() with a new snippet that results in the
same functionality while mirroring the flow vfio_pci_set_msi_trigger()
as a preparatory step to merge vfio_pci_set_msi_trigger() and
vfio_pci_set_intx_trigger(). The new snippet is marked as
"temporary" until the flows are merged.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 32 ++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index d7c2cd739d74..37065623d286 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -103,6 +103,13 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
/*
* INTx
*/
+static void vfio_send_intx_eventfd_ctx(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx)
+{
+ if (likely(is_intx(vdev) && !vdev->virq_disabled))
+ eventfd_signal(ctx->trigger);
+}
+
static void vfio_send_intx_eventfd(void *opaque, void *unused)
{
struct vfio_pci_core_device *vdev = opaque;
@@ -245,7 +252,7 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
spin_unlock_irqrestore(&vdev->irqlock, flags);

if (ret == IRQ_HANDLED)
- vfio_send_intx_eventfd(vdev, NULL);
+ vfio_send_intx_eventfd_ctx(vdev, ctx);

return ret;
}
@@ -690,6 +697,9 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
unsigned int count, uint32_t flags,
void *data)
{
+ struct vfio_pci_irq_ctx *ctx;
+ unsigned int i;
+
if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
vfio_intx_disable(vdev);
return 0;
@@ -719,13 +729,21 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
if (!is_intx(vdev))
return -EINVAL;

- if (flags & VFIO_IRQ_SET_DATA_NONE) {
- vfio_send_intx_eventfd(vdev, NULL);
- } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
- uint8_t trigger = *(uint8_t *)data;
- if (trigger)
- vfio_send_intx_eventfd(vdev, NULL);
+ /* temporary */
+ for (i = start; i < start + count; i++) {
+ ctx = vfio_irq_ctx_get(vdev, i);
+ if (!ctx || !ctx->trigger)
+ continue;
+ if (flags & VFIO_IRQ_SET_DATA_NONE) {
+ vfio_send_intx_eventfd_ctx(vdev, ctx);
+ } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
+ uint8_t *bools = data;
+
+ if (bools[i - start])
+ vfio_send_intx_eventfd_ctx(vdev, ctx);
+ }
}
+
return 0;
}

--
2.34.1


2024-02-02 05:05:12

by Reinette Chatre

[permalink] [raw]
Subject: [PATCH 16/17] vfio/pci: Move vfio_msi_disable() to be with other MSI/MSI-X management code

The interrupt management code is mostly organized in four sections:
shared code (interrupt type checking and interrupt context management),
INTx management code, MSI/MSI-X management code, and IOCTL support.

vfio_msi_disable() is separate from the other MSI/MSI-X code. This may
have been required because vfio_msi_disable() relies on
vfio_irq_set_vector_signal() within the IOCTL support.

Since vfio_irq_set_vector_signal() is declared earlier it is not
required for MSI/MSI-X management code to be mixed with the IOCTL
support.

Move vfio_msi_disable() to be located with all the other MSI/MSI-X
management code.

This move makes it simpler to initialize the interrupt management
callbacks with vfio_msi_disable() so that it can be provided to the
IOCTL support code.

Signed-off-by: Reinette Chatre <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 58 +++++++++++++++----------------
1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 9217fea3f636..daa84a317f40 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -404,6 +404,35 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev,
return 0;
}

+static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
+ unsigned int index)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_irq_ctx *ctx;
+ unsigned long i;
+ u16 cmd;
+
+ xa_for_each(&vdev->ctx, i, ctx) {
+ vfio_virqfd_disable(&ctx->unmask);
+ vfio_virqfd_disable(&ctx->mask);
+ vfio_irq_set_vector_signal(vdev, i, -1, index);
+ vfio_irq_ctx_free(vdev, ctx, i);
+ }
+
+ cmd = vfio_pci_memory_lock_and_enable(vdev);
+ pci_free_irq_vectors(pdev);
+ vfio_pci_memory_unlock_and_restore(vdev, cmd);
+
+ /*
+ * Both disable paths above use pci_intx_for_msi() to clear DisINTx
+ * via their shutdown paths. Restore for NoINTx devices.
+ */
+ if (vdev->nointx)
+ pci_intx(pdev, 0);
+
+ vdev->irq_type = VFIO_PCI_NUM_IRQS;
+}
+
/*
* vfio_msi_alloc_irq() returns the Linux IRQ number of an MSI or MSI-X device
* interrupt vector. If a Linux IRQ number is not available then a new
@@ -617,35 +646,6 @@ static int vfio_irq_set_block(struct vfio_pci_core_device *vdev,
return ret;
}

-static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
- unsigned int index)
-{
- struct pci_dev *pdev = vdev->pdev;
- struct vfio_pci_irq_ctx *ctx;
- unsigned long i;
- u16 cmd;
-
- xa_for_each(&vdev->ctx, i, ctx) {
- vfio_virqfd_disable(&ctx->unmask);
- vfio_virqfd_disable(&ctx->mask);
- vfio_irq_set_vector_signal(vdev, i, -1, index);
- vfio_irq_ctx_free(vdev, ctx, i);
- }
-
- cmd = vfio_pci_memory_lock_and_enable(vdev);
- pci_free_irq_vectors(pdev);
- vfio_pci_memory_unlock_and_restore(vdev, cmd);
-
- /*
- * Both disable paths above use pci_intx_for_msi() to clear DisINTx
- * via their shutdown paths. Restore for NoINTx devices.
- */
- if (vdev->nointx)
- pci_intx(pdev, 0);
-
- vdev->irq_type = VFIO_PCI_NUM_IRQS;
-}
-
/*
* IOCTL support
*/
--
2.34.1


2024-02-05 22:37:21

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 17/17] vfio/pci: Remove duplicate interrupt management flow

On Thu, 1 Feb 2024 20:57:11 -0800
Reinette Chatre <[email protected]> wrote:

> vfio_pci_set_intx_trigger() and vfio_pci_set_trigger() have the
> same flow that calls interrupt type (INTx, MSI, MSI-X) specific
> functions.
>
> Create callbacks for the interrupt type specific code that
> can be called by the shared code so that only one of these functions
> are needed. Rename the final generic function shared by all
> interrupt types vfio_pci_set_trigger().
>
> Relocate the "IOCTL support" marker to correctly mark the
> now generic code.
>
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 104 ++++++++++--------------------
> 1 file changed, 35 insertions(+), 69 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index daa84a317f40..a5b337cfae60 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -32,6 +32,12 @@ struct vfio_pci_irq_ctx {
> };
>
> struct vfio_pci_intr_ops {
> + int (*enable)(struct vfio_pci_core_device *vdev, unsigned int start,
> + unsigned int count, unsigned int index);
> + void (*disable)(struct vfio_pci_core_device *vdev,
> + unsigned int index);
> + void (*send_eventfd)(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_irq_ctx *ctx);
> int (*request_interrupt)(struct vfio_pci_core_device *vdev,
> struct vfio_pci_irq_ctx *ctx,
> unsigned int vector, unsigned int index);
> @@ -356,6 +362,12 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev,
> /*
> * MSI/MSI-X
> */
> +static void vfio_send_msi_eventfd(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_irq_ctx *ctx)
> +{
> + eventfd_signal(ctx->trigger);
> +}
> +
> static irqreturn_t vfio_msihandler(int irq, void *arg)
> {
> struct eventfd_ctx *trigger = arg;
> @@ -544,13 +556,22 @@ static void vfio_msi_unregister_producer(struct vfio_pci_irq_ctx *ctx)
> irq_bypass_unregister_producer(&ctx->producer);
> }
>
> +/*
> + * IOCTL support
> + */
> static struct vfio_pci_intr_ops intr_ops[] = {
> [VFIO_PCI_INTX_IRQ_INDEX] = {
> + .enable = vfio_intx_enable,
> + .disable = vfio_intx_disable,
> + .send_eventfd = vfio_send_intx_eventfd_ctx,
> .request_interrupt = vfio_intx_request_interrupt,
> .free_interrupt = vfio_intx_free_interrupt,
> .device_name = vfio_intx_device_name,
> },
> [VFIO_PCI_MSI_IRQ_INDEX] = {
> + .enable = vfio_msi_enable,
> + .disable = vfio_msi_disable,
> + .send_eventfd = vfio_send_msi_eventfd,
> .request_interrupt = vfio_msi_request_interrupt,
> .free_interrupt = vfio_msi_free_interrupt,
> .device_name = vfio_msi_device_name,
> @@ -558,6 +579,9 @@ static struct vfio_pci_intr_ops intr_ops[] = {
> .unregister_producer = vfio_msi_unregister_producer,
> },
> [VFIO_PCI_MSIX_IRQ_INDEX] = {
> + .enable = vfio_msi_enable,
> + .disable = vfio_msi_disable,
> + .send_eventfd = vfio_send_msi_eventfd,
> .request_interrupt = vfio_msi_request_interrupt,
> .free_interrupt = vfio_msi_free_interrupt,
> .device_name = vfio_msi_device_name,
> @@ -646,9 +670,6 @@ static int vfio_irq_set_block(struct vfio_pci_core_device *vdev,
> return ret;
> }
>
> -/*
> - * IOCTL support
> - */
> static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
> unsigned int index, unsigned int start,
> unsigned int count, uint32_t flags,
> @@ -701,71 +722,16 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_core_device *vdev,
> return 0;
> }
>
> -static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
> - unsigned int index, unsigned int start,
> - unsigned int count, uint32_t flags,
> - void *data)
> -{
> - struct vfio_pci_irq_ctx *ctx;
> - unsigned int i;
> -
> - if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> - vfio_intx_disable(vdev, index);
> - return 0;
> - }
> -
> - if (!(is_intx(vdev) || is_irq_none(vdev)))
> - return -EINVAL;
> -
> - if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
> - int32_t *fds = data;
> - int ret;
> -
> - if (is_intx(vdev))
> - return vfio_irq_set_block(vdev, start, count, fds, index);
> -
> - ret = vfio_intx_enable(vdev, start, count, index);
> - if (ret)
> - return ret;
> -
> - ret = vfio_irq_set_block(vdev, start, count, fds, index);
> - if (ret)
> - vfio_intx_disable(vdev, index);
> -
> - return ret;
> - }
> -
> - if (!is_intx(vdev))
> - return -EINVAL;
> -
> - /* temporary */
> - for (i = start; i < start + count; i++) {
> - ctx = vfio_irq_ctx_get(vdev, i);
> - if (!ctx || !ctx->trigger)
> - continue;
> - if (flags & VFIO_IRQ_SET_DATA_NONE) {
> - vfio_send_intx_eventfd_ctx(vdev, ctx);
> - } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
> - uint8_t *bools = data;
> -
> - if (bools[i - start])
> - vfio_send_intx_eventfd_ctx(vdev, ctx);
> - }
> - }
> -
> - return 0;
> -}
> -
> -static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
> - unsigned int index, unsigned int start,
> - unsigned int count, uint32_t flags,
> - void *data)
> +static int vfio_pci_set_trigger(struct vfio_pci_core_device *vdev,
> + unsigned int index, unsigned int start,
> + unsigned int count, uint32_t flags,
> + void *data)
> {
> struct vfio_pci_irq_ctx *ctx;
> unsigned int i;
>
> if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> - vfio_msi_disable(vdev, index);
> + intr_ops[index].disable(vdev, index);
> return 0;
> }
>
> @@ -780,13 +746,13 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
> return vfio_irq_set_block(vdev, start, count,
> fds, index);
>
> - ret = vfio_msi_enable(vdev, start, count, index);
> + ret = intr_ops[index].enable(vdev, start, count, index);
> if (ret)
> return ret;
>
> ret = vfio_irq_set_block(vdev, start, count, fds, index);
> if (ret)
> - vfio_msi_disable(vdev, index);
> + intr_ops[index].disable(vdev, index);
>
> return ret;
> }
> @@ -799,11 +765,11 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
> if (!ctx || !ctx->trigger)
> continue;
> if (flags & VFIO_IRQ_SET_DATA_NONE) {
> - eventfd_signal(ctx->trigger);
> + intr_ops[index].send_eventfd(vdev, ctx);
> } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
> uint8_t *bools = data;

Nit, an opportunity to add the missing new line between variable
declaration and code here. Thanks,

Alex

> if (bools[i - start])
> - eventfd_signal(ctx->trigger);
> + intr_ops[index].send_eventfd(vdev, ctx);
> }
> }
> return 0;
> @@ -912,7 +878,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
> func = vfio_pci_set_intx_unmask;
> break;
> case VFIO_IRQ_SET_ACTION_TRIGGER:
> - func = vfio_pci_set_intx_trigger;
> + func = vfio_pci_set_trigger;
> break;
> }
> break;
> @@ -924,7 +890,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
> /* XXX Need masking support exported */
> break;
> case VFIO_IRQ_SET_ACTION_TRIGGER:
> - func = vfio_pci_set_msi_trigger;
> + func = vfio_pci_set_trigger;
> break;
> }
> break;


2024-02-05 22:38:00

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

On Thu, 1 Feb 2024 20:57:09 -0800
Reinette Chatre <[email protected]> wrote:

> vfio_pci_set_intx_trigger() and vfio_pci_set_msi_trigger() have
> flows that can be shared.
>
> For INTx, MSI, and MSI-X interrupt management to share the
> same enable/disable flow the interrupt specific enable and
> disable functions should have the same signatures.
>
> Let vfio_intx_enable() and vfio_msi_enable() use the same
> parameters by passing "start" and "count" to these functions
> instead of letting the (what will eventually be) common code
> interpret these values.
>
> Providing "start" and "count" to vfio_intx_enable()
> enables the INTx specific check of these parameters to move into
> the INTx specific vfio_intx_enable(). Similarly, providing "start"
> and "count" to vfio_msi_enable() enables the MSI/MSI-X specific
> code to initialize number of vectors needed.
>
> The shared MSI/MSI-X code needs the interrupt index. Provide
> the interrupt index (clearly marked as unused) to the INTx code
> to use the same signatures.
>
> With interrupt type specific code using the same parameters it
> is possible to have common code that calls the enable/disable
> code for different interrupt types.
>
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 27 ++++++++++++++++++---------
> 1 file changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 37065623d286..9217fea3f636 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -257,13 +257,18 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
> return ret;
> }
>
> -static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
> +static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
> + unsigned int start, unsigned int count,
> + unsigned int __always_unused index)
> {
> struct vfio_pci_irq_ctx *ctx;
>
> if (!is_irq_none(vdev))
> return -EINVAL;
>
> + if (start != 0 || count != 1)
> + return -EINVAL;
> +
> if (!vdev->pdev->irq)
> return -ENODEV;
>
> @@ -332,7 +337,8 @@ static char *vfio_intx_device_name(struct vfio_pci_core_device *vdev,
> return kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev));
> }
>
> -static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
> +static void vfio_intx_disable(struct vfio_pci_core_device *vdev,
> + unsigned int __always_unused index)
> {
> struct vfio_pci_irq_ctx *ctx;
>
> @@ -358,17 +364,20 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
> return IRQ_HANDLED;
> }
>
> -static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec,
> +static int vfio_msi_enable(struct vfio_pci_core_device *vdev,
> + unsigned int start, unsigned int count,
> unsigned int index)
> {
> struct pci_dev *pdev = vdev->pdev;
> unsigned int flag;
> - int ret;
> + int ret, nvec;
> u16 cmd;
>
> if (!is_irq_none(vdev))
> return -EINVAL;
>
> + nvec = start + count;
> +
> flag = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
>
> /* return the number of supported vectors if we can't get all: */
> @@ -701,11 +710,11 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
> unsigned int i;
>
> if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> - vfio_intx_disable(vdev);
> + vfio_intx_disable(vdev, index);
> return 0;
> }
>
> - if (!(is_intx(vdev) || is_irq_none(vdev)) || start != 0 || count != 1)
> + if (!(is_intx(vdev) || is_irq_none(vdev)))
> return -EINVAL;
>
> if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
> if (is_intx(vdev))
> return vfio_irq_set_block(vdev, start, count, fds, index);
>
> - ret = vfio_intx_enable(vdev);
> + ret = vfio_intx_enable(vdev, start, count, index);

Please trace what happens when a user calls SET_IRQS to setup a trigger
eventfd with start = 0, count = 1, followed by any other combination of
start and count values once is_intx() is true. vfio_intx_enable()
cannot be the only place we bounds check the user, all of the INTx
callbacks should be an error or nop if vector != 0. Thanks,

Alex

> if (ret)
> return ret;
>
> ret = vfio_irq_set_block(vdev, start, count, fds, index);
> if (ret)
> - vfio_intx_disable(vdev);
> + vfio_intx_disable(vdev, index);
>
> return ret;
> }
> @@ -771,7 +780,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
> return vfio_irq_set_block(vdev, start, count,
> fds, index);
>
> - ret = vfio_msi_enable(vdev, start + count, index);
> + ret = vfio_msi_enable(vdev, start, count, index);
> if (ret)
> return ret;
>


2024-02-05 22:51:59

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 12/17] vfio/pci: Remove msi term from generic code

On Thu, 1 Feb 2024 20:57:06 -0800
Reinette Chatre <[email protected]> wrote:

> vfio_msi_set_vector_signal() and by extension vfio_msi_set_block()
> are no longer specific to MSI and MSI-X interrupts.
>
> Change the name of these functions in preparation for them
> to be used for management of INTx interrupts.
>
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 07dc388c4513..7f9dc81cb97f 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -557,7 +557,7 @@ static struct vfio_pci_intr_ops intr_ops[] = {
> },
> };
>
> -static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> +static int vfio_irq_set_vector_signal(struct vfio_pci_core_device *vdev,
> unsigned int vector, int fd,
> unsigned int index)
> {
> @@ -617,7 +617,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> return ret;
> }
>
> -static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
> +static int vfio_irq_set_block(struct vfio_pci_core_device *vdev,
> unsigned int start, unsigned int count,
> int32_t *fds, unsigned int index)
> {
> @@ -626,12 +626,12 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
>
> for (i = 0, j = start; i < count && !ret; i++, j++) {
> int fd = fds ? fds[i] : -1;
> - ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
> + ret = vfio_irq_set_vector_signal(vdev, j, fd, index);
> }
>
> if (ret) {
> for (i = start; i < j; i++)
> - vfio_msi_set_vector_signal(vdev, i, -1, index);
> + vfio_irq_set_vector_signal(vdev, i, -1, index);
> }
>
> return ret;
> @@ -648,7 +648,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
> xa_for_each(&vdev->ctx, i, ctx) {
> vfio_virqfd_disable(&ctx->unmask);
> vfio_virqfd_disable(&ctx->mask);
> - vfio_msi_set_vector_signal(vdev, i, -1, index);
> + vfio_irq_set_vector_signal(vdev, i, -1, index);
> vfio_irq_ctx_free(vdev, ctx, i);
> }
>
> @@ -786,14 +786,14 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
> int ret;
>
> if (vdev->irq_type == index)

It's unfortunate that the one place in the code that doesn't use
irq_is() is the one that turns into the central callout function, maybe
we can fix it along the way. Thanks,

Alex


> - return vfio_msi_set_block(vdev, start, count,
> + return vfio_irq_set_block(vdev, start, count,
> fds, index);
>
> ret = vfio_msi_enable(vdev, start + count, index);
> if (ret)
> return ret;
>
> - ret = vfio_msi_set_block(vdev, start, count, fds, index);
> + ret = vfio_irq_set_block(vdev, start, count, fds, index);
> if (ret)
> vfio_msi_disable(vdev, index);
>


2024-02-05 22:52:50

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 06/17] vfio/pci: Remove interrupt index interpretation from wrappers

On Thu, 1 Feb 2024 20:57:00 -0800
Reinette Chatre <[email protected]> wrote:

> vfio_pci_set_intx_trigger() and vfio_pci_set_msi_trigger() have similar
> enough flows that can be converged into one shared flow instead of
> duplicated.
>
> To share code between management of interrupt types it is necessary that
> the type of the interrupt is only interpreted by the code specific to
> the interrupt type being managed.
>
> Remove interrupt type interpretation from what will eventually be
> shared code (vfio_pci_set_msi_trigger(), vfio_msi_set_block()) by
> pushing this interpretation to be within the interrupt
> type specific code (vfio_msi_enable(),
> (temporary) vfio_msi_set_vector_signal()).
>
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> Note to maintainers:
> Originally formed part of the IMS submission below, but is not
> specific to IMS.
> https://lore.kernel.org/lkml/[email protected]
>
> drivers/vfio/pci/vfio_pci_intrs.c | 38 +++++++++++++++++--------------
> 1 file changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 97a3bb22b186..31f73c70fcd2 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -346,16 +346,19 @@ static irqreturn_t vfio_msihandler(int irq, void *arg)
> return IRQ_HANDLED;
> }
>
> -static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msix)
> +static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec,
> + unsigned int index)
> {
> struct pci_dev *pdev = vdev->pdev;
> - unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
> + unsigned int flag;
> int ret;
> u16 cmd;
>
> if (!is_irq_none(vdev))
> return -EINVAL;
>
> + flag = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
> +
> /* return the number of supported vectors if we can't get all: */
> cmd = vfio_pci_memory_lock_and_enable(vdev);
> ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
> @@ -367,10 +370,9 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
> }
> vfio_pci_memory_unlock_and_restore(vdev, cmd);
>
> - vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
> - VFIO_PCI_MSI_IRQ_INDEX;
> + vdev->irq_type = index;
>
> - if (!msix) {
> + if (index == VFIO_PCI_MSI_IRQ_INDEX) {
> /*
> * Compute the virtual hardware field for max msi vectors -
> * it is the log base 2 of the number of vectors.
> @@ -413,8 +415,10 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
> }
>
> static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> - unsigned int vector, int fd, bool msix)
> + unsigned int vector, int fd,
> + unsigned int index)
> {
> + bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;

Cleanup the unnecessary ternary while here, this can just be:

bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX);

Thanks,

Alex

> struct pci_dev *pdev = vdev->pdev;
> struct vfio_pci_irq_ctx *ctx;
> struct eventfd_ctx *trigger;
> @@ -505,25 +509,26 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>
> static int vfio_msi_set_block(struct vfio_pci_core_device *vdev,
> unsigned int start, unsigned int count,
> - int32_t *fds, bool msix)
> + int32_t *fds, unsigned int index)
> {
> unsigned int i, j;
> int ret = 0;
>
> for (i = 0, j = start; i < count && !ret; i++, j++) {
> int fd = fds ? fds[i] : -1;
> - ret = vfio_msi_set_vector_signal(vdev, j, fd, msix);
> + ret = vfio_msi_set_vector_signal(vdev, j, fd, index);
> }
>
> if (ret) {
> for (i = start; i < j; i++)
> - vfio_msi_set_vector_signal(vdev, i, -1, msix);
> + vfio_msi_set_vector_signal(vdev, i, -1, index);
> }
>
> return ret;
> }
>
> -static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
> +static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
> + unsigned int index)
> {
> struct pci_dev *pdev = vdev->pdev;
> struct vfio_pci_irq_ctx *ctx;
> @@ -533,7 +538,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
> xa_for_each(&vdev->ctx, i, ctx) {
> vfio_virqfd_disable(&ctx->unmask);
> vfio_virqfd_disable(&ctx->mask);
> - vfio_msi_set_vector_signal(vdev, i, -1, msix);
> + vfio_msi_set_vector_signal(vdev, i, -1, index);
> }
>
> cmd = vfio_pci_memory_lock_and_enable(vdev);
> @@ -656,10 +661,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
> {
> struct vfio_pci_irq_ctx *ctx;
> unsigned int i;
> - bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
>
> if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> - vfio_msi_disable(vdev, msix);
> + vfio_msi_disable(vdev, index);
> return 0;
> }
>
> @@ -672,15 +676,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
>
> if (vdev->irq_type == index)
> return vfio_msi_set_block(vdev, start, count,
> - fds, msix);
> + fds, index);
>
> - ret = vfio_msi_enable(vdev, start + count, msix);
> + ret = vfio_msi_enable(vdev, start + count, index);
> if (ret)
> return ret;
>
> - ret = vfio_msi_set_block(vdev, start, count, fds, msix);
> + ret = vfio_msi_set_block(vdev, start, count, fds, index);
> if (ret)
> - vfio_msi_disable(vdev, msix);
> + vfio_msi_disable(vdev, index);
>
> return ret;
> }


2024-02-05 22:52:57

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 07/17] vfio/pci: Preserve per-interrupt contexts

On Thu, 1 Feb 2024 20:57:01 -0800
Reinette Chatre <[email protected]> wrote:

> MSI and MSI-X interrupt management for PCI passthrough devices create
> a new per-interrupt context every time an interrupt is allocated,
> freeing it when the interrupt is freed.
>
> The per-interrupt context contains the properties of a particular
> interrupt. Without a property that persists across interrupt allocation
> and free it is acceptable to always create a new per-interrupt context.
>
> INTx interrupt context has a "masked" property that persists across
> allocation and free and thus preserves its interrupt context
> across interrupt allocation and free calls.
>
> MSI and MSI-X interrupts already remain allocated across interrupt
> allocation and free requests, additionally maintaining the
> individual interrupt context is a reflection of this existing
> behavior and matches INTx behavior so that more code can be shared.
>
> An additional benefit is that maintaining interrupt context supports
> a potential future use case of emulated interrupts, where the
> "is this interrupt emulated" is a property that needs to persist
> across allocation and free requests.
>
> Persistent interrupt contexts means that existence of per-interrupt
> context no longer implies a valid trigger, pointers to freed memory
> should be cleared, and a new per-interrupt context cannot be assumed
> needing allocation when an interrupt is allocated.
>
> Signed-off-by: Reinette Chatre <[email protected]>
> ---
> Note to maintainers:
> This addition originally formed part of the IMS work below that mostly
> ignored INTx. This work focuses on INTx, MSI, MSI-X where this addition
> is relevant.
> https://lore.kernel.org/lkml/[email protected]
>
> drivers/vfio/pci/vfio_pci_intrs.c | 26 ++++++++++++++------------
> 1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 31f73c70fcd2..7ca2b983b66e 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -427,7 +427,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>
> ctx = vfio_irq_ctx_get(vdev, vector);
>
> - if (ctx) {
> + if (ctx && ctx->trigger) {
> irq_bypass_unregister_producer(&ctx->producer);
> irq = pci_irq_vector(pdev, vector);
> cmd = vfio_pci_memory_lock_and_enable(vdev);
> @@ -435,8 +435,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> vfio_pci_memory_unlock_and_restore(vdev, cmd);
> /* Interrupt stays allocated, will be freed at MSI-X disable. */
> kfree(ctx->name);
> + ctx->name = NULL;

Setting ctx->name = NULL is not strictly necessary and does not match
the INTx code that we're claiming to try to emulate. ctx->name is only
tested immediately after allocation below, otherwise it can be inferred
from ctx->trigger. Thanks,

Alex

> eventfd_ctx_put(ctx->trigger);
> - vfio_irq_ctx_free(vdev, ctx, vector);
> + ctx->trigger = NULL;
> }
>
> if (fd < 0)
> @@ -449,16 +450,17 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> return irq;
> }
>
> - ctx = vfio_irq_ctx_alloc(vdev, vector);
> - if (!ctx)
> - return -ENOMEM;
> + /* Per-interrupt context remain allocated. */
> + if (!ctx) {
> + ctx = vfio_irq_ctx_alloc(vdev, vector);
> + if (!ctx)
> + return -ENOMEM;
> + }
>
> ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
> msix ? "x" : "", vector, pci_name(pdev));
> - if (!ctx->name) {
> - ret = -ENOMEM;
> - goto out_free_ctx;
> - }
> + if (!ctx->name)
> + return -ENOMEM;
>
> trigger = eventfd_ctx_fdget(fd);
> if (IS_ERR(trigger)) {
> @@ -502,8 +504,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> eventfd_ctx_put(trigger);
> out_free_name:
> kfree(ctx->name);
> -out_free_ctx:
> - vfio_irq_ctx_free(vdev, ctx, vector);
> + ctx->name = NULL;
> return ret;
> }
>
> @@ -539,6 +540,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev,
> vfio_virqfd_disable(&ctx->unmask);
> vfio_virqfd_disable(&ctx->mask);
> vfio_msi_set_vector_signal(vdev, i, -1, index);
> + vfio_irq_ctx_free(vdev, ctx, i);
> }
>
> cmd = vfio_pci_memory_lock_and_enable(vdev);
> @@ -694,7 +696,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
>
> for (i = start; i < start + count; i++) {
> ctx = vfio_irq_ctx_get(vdev, i);
> - if (!ctx)
> + if (!ctx || !ctx->trigger)
> continue;
> if (flags & VFIO_IRQ_SET_DATA_NONE) {
> eventfd_signal(ctx->trigger);


2024-02-06 21:45:19

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 06/17] vfio/pci: Remove interrupt index interpretation from wrappers

Hi Alex,

On 2/5/2024 2:35 PM, Alex Williamson wrote:
> On Thu, 1 Feb 2024 20:57:00 -0800
> Reinette Chatre <[email protected]> wrote:

>> @@ -413,8 +415,10 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
>> }
>>
>> static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>> - unsigned int vector, int fd, bool msix)
>> + unsigned int vector, int fd,
>> + unsigned int index)
>> {
>> + bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
>
> Cleanup the unnecessary ternary while here, this can just be:
>
> bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX);
>

Will do. Thank you.

Reinette

2024-02-06 21:45:46

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 07/17] vfio/pci: Preserve per-interrupt contexts

Hi Alex,

On 2/5/2024 2:35 PM, Alex Williamson wrote:
> On Thu, 1 Feb 2024 20:57:01 -0800
> Reinette Chatre <[email protected]> wrote:

..

>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 31f73c70fcd2..7ca2b983b66e 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -427,7 +427,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>
>> ctx = vfio_irq_ctx_get(vdev, vector);
>>
>> - if (ctx) {
>> + if (ctx && ctx->trigger) {
>> irq_bypass_unregister_producer(&ctx->producer);
>> irq = pci_irq_vector(pdev, vector);
>> cmd = vfio_pci_memory_lock_and_enable(vdev);
>> @@ -435,8 +435,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>> vfio_pci_memory_unlock_and_restore(vdev, cmd);
>> /* Interrupt stays allocated, will be freed at MSI-X disable. */
>> kfree(ctx->name);
>> + ctx->name = NULL;
>
> Setting ctx->name = NULL is not strictly necessary and does not match
> the INTx code that we're claiming to try to emulate. ctx->name is only
> tested immediately after allocation below, otherwise it can be inferred
> from ctx->trigger. Thanks,

This all matches my understanding. I added ctx->name = NULL after every kfree(ctx->name)
(see below for confirmation of other instance). You are correct that the flow
infers validity of ctx->name from ctx->trigger. My motivation for
adding ctx->name = NULL is that, since the interrupt context persists, this
change ensures that there will be no pointer that points to freed memory. I
am not comfortable leaving pointers to freed memory around.

>> eventfd_ctx_put(ctx->trigger);
>> - vfio_irq_ctx_free(vdev, ctx, vector);
>> + ctx->trigger = NULL;
>> }
>>
>> if (fd < 0)
>> @@ -449,16 +450,17 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>> return irq;
>> }
>>
>> - ctx = vfio_irq_ctx_alloc(vdev, vector);
>> - if (!ctx)
>> - return -ENOMEM;
>> + /* Per-interrupt context remain allocated. */
>> + if (!ctx) {
>> + ctx = vfio_irq_ctx_alloc(vdev, vector);
>> + if (!ctx)
>> + return -ENOMEM;
>> + }
>>
>> ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
>> msix ? "x" : "", vector, pci_name(pdev));
>> - if (!ctx->name) {
>> - ret = -ENOMEM;
>> - goto out_free_ctx;
>> - }
>> + if (!ctx->name)
>> + return -ENOMEM;
>>
>> trigger = eventfd_ctx_fdget(fd);
>> if (IS_ERR(trigger)) {
>> @@ -502,8 +504,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>> eventfd_ctx_put(trigger);
>> out_free_name:
>> kfree(ctx->name);
>> -out_free_ctx:
>> - vfio_irq_ctx_free(vdev, ctx, vector);
>> + ctx->name = NULL;

Here is the other one.

Reinette

2024-02-06 21:46:16

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 12/17] vfio/pci: Remove msi term from generic code

Hi Alex,

On 2/5/2024 2:35 PM, Alex Williamson wrote:
> On Thu, 1 Feb 2024 20:57:06 -0800
> Reinette Chatre <[email protected]> wrote:
>

>> @@ -786,14 +786,14 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
>> int ret;
>>
>> if (vdev->irq_type == index)
>
> It's unfortunate that the one place in the code that doesn't use
> irq_is() is the one that turns into the central callout function, maybe
> we can fix it along the way. Thanks,
>

Will do. Thank you.

Reinette

2024-02-06 21:46:53

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

Hi Alex,

On 2/5/2024 2:35 PM, Alex Williamson wrote:
> On Thu, 1 Feb 2024 20:57:09 -0800
> Reinette Chatre <[email protected]> wrote:

..

>> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
>> if (is_intx(vdev))
>> return vfio_irq_set_block(vdev, start, count, fds, index);
>>
>> - ret = vfio_intx_enable(vdev);
>> + ret = vfio_intx_enable(vdev, start, count, index);
>
> Please trace what happens when a user calls SET_IRQS to setup a trigger
> eventfd with start = 0, count = 1, followed by any other combination of
> start and count values once is_intx() is true. vfio_intx_enable()
> cannot be the only place we bounds check the user, all of the INTx
> callbacks should be an error or nop if vector != 0. Thanks,
>

Thank you very much for catching this. I plan to add the vector
check to the device_name() and request_interrupt() callbacks. I do
not think it is necessary to add the vector check to disable() since
it does not operate on a range and from what I can tell it depends on
a successful enable() that already contains the vector check. Similar,
free_interrupt() requires a successful request_interrupt() (that will
have vector check in next version).
send_eventfd() requires a valid interrupt context that is only
possible if enable() or request_interrupt() succeeded.

If user space creates an eventfd with start = 0 and count = 1
and then attempts to trigger the eventfd using another combination then
the changes in this series will result in a nop while the current
implementation will result in -EINVAL. Is this acceptable?

Reinette

2024-02-06 21:47:15

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 17/17] vfio/pci: Remove duplicate interrupt management flow

Hi Alex,

On 2/5/2024 2:35 PM, Alex Williamson wrote:
> On Thu, 1 Feb 2024 20:57:11 -0800
> Reinette Chatre <[email protected]> wrote:

..

>> @@ -799,11 +765,11 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
>> if (!ctx || !ctx->trigger)
>> continue;
>> if (flags & VFIO_IRQ_SET_DATA_NONE) {
>> - eventfd_signal(ctx->trigger);
>> + intr_ops[index].send_eventfd(vdev, ctx);
>> } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
>> uint8_t *bools = data;
>
> Nit, an opportunity to add the missing new line between variable
> declaration and code here. Thanks,

Will do. Thank you.

Reinette

2024-02-06 22:03:38

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 07/17] vfio/pci: Preserve per-interrupt contexts

On Tue, 6 Feb 2024 13:45:22 -0800
Reinette Chatre <[email protected]> wrote:

> Hi Alex,
>
> On 2/5/2024 2:35 PM, Alex Williamson wrote:
> > On Thu, 1 Feb 2024 20:57:01 -0800
> > Reinette Chatre <[email protected]> wrote:
>
> ..
>
> >> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> >> index 31f73c70fcd2..7ca2b983b66e 100644
> >> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> >> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> >> @@ -427,7 +427,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >>
> >> ctx = vfio_irq_ctx_get(vdev, vector);
> >>
> >> - if (ctx) {
> >> + if (ctx && ctx->trigger) {
> >> irq_bypass_unregister_producer(&ctx->producer);
> >> irq = pci_irq_vector(pdev, vector);
> >> cmd = vfio_pci_memory_lock_and_enable(vdev);
> >> @@ -435,8 +435,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >> vfio_pci_memory_unlock_and_restore(vdev, cmd);
> >> /* Interrupt stays allocated, will be freed at MSI-X disable. */
> >> kfree(ctx->name);
> >> + ctx->name = NULL;
> >
> > Setting ctx->name = NULL is not strictly necessary and does not match
> > the INTx code that we're claiming to try to emulate. ctx->name is only
> > tested immediately after allocation below, otherwise it can be inferred
> > from ctx->trigger. Thanks,
>
> This all matches my understanding. I added ctx->name = NULL after every kfree(ctx->name)
> (see below for confirmation of other instance). You are correct that the flow
> infers validity of ctx->name from ctx->trigger. My motivation for
> adding ctx->name = NULL is that, since the interrupt context persists, this
> change ensures that there will be no pointer that points to freed memory. I
> am not comfortable leaving pointers to freed memory around.

Fair enough. Maybe note the change in the commit log. Thanks,

Alex

> >> eventfd_ctx_put(ctx->trigger);
> >> - vfio_irq_ctx_free(vdev, ctx, vector);
> >> + ctx->trigger = NULL;
> >> }
> >>
> >> if (fd < 0)
> >> @@ -449,16 +450,17 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >> return irq;
> >> }
> >>
> >> - ctx = vfio_irq_ctx_alloc(vdev, vector);
> >> - if (!ctx)
> >> - return -ENOMEM;
> >> + /* Per-interrupt context remain allocated. */
> >> + if (!ctx) {
> >> + ctx = vfio_irq_ctx_alloc(vdev, vector);
> >> + if (!ctx)
> >> + return -ENOMEM;
> >> + }
> >>
> >> ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
> >> msix ? "x" : "", vector, pci_name(pdev));
> >> - if (!ctx->name) {
> >> - ret = -ENOMEM;
> >> - goto out_free_ctx;
> >> - }
> >> + if (!ctx->name)
> >> + return -ENOMEM;
> >>
> >> trigger = eventfd_ctx_fdget(fd);
> >> if (IS_ERR(trigger)) {
> >> @@ -502,8 +504,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >> eventfd_ctx_put(trigger);
> >> out_free_name:
> >> kfree(ctx->name);
> >> -out_free_ctx:
> >> - vfio_irq_ctx_free(vdev, ctx, vector);
> >> + ctx->name = NULL;
>
> Here is the other one.
>
> Reinette
>


2024-02-06 22:04:03

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

On Tue, 6 Feb 2024 13:46:37 -0800
Reinette Chatre <[email protected]> wrote:

> Hi Alex,
>
> On 2/5/2024 2:35 PM, Alex Williamson wrote:
> > On Thu, 1 Feb 2024 20:57:09 -0800
> > Reinette Chatre <[email protected]> wrote:
>
> ..
>
> >> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
> >> if (is_intx(vdev))
> >> return vfio_irq_set_block(vdev, start, count, fds, index);
> >>
> >> - ret = vfio_intx_enable(vdev);
> >> + ret = vfio_intx_enable(vdev, start, count, index);
> >
> > Please trace what happens when a user calls SET_IRQS to setup a trigger
> > eventfd with start = 0, count = 1, followed by any other combination of
> > start and count values once is_intx() is true. vfio_intx_enable()
> > cannot be the only place we bounds check the user, all of the INTx
> > callbacks should be an error or nop if vector != 0. Thanks,
> >
>
> Thank you very much for catching this. I plan to add the vector
> check to the device_name() and request_interrupt() callbacks. I do
> not think it is necessary to add the vector check to disable() since
> it does not operate on a range and from what I can tell it depends on
> a successful enable() that already contains the vector check. Similar,
> free_interrupt() requires a successful request_interrupt() (that will
> have vector check in next version).
> send_eventfd() requires a valid interrupt context that is only
> possible if enable() or request_interrupt() succeeded.

Sounds reasonable.

> If user space creates an eventfd with start = 0 and count = 1
> and then attempts to trigger the eventfd using another combination then
> the changes in this series will result in a nop while the current
> implementation will result in -EINVAL. Is this acceptable?

I think by nop, you mean the ioctl returns success. Was the call a
success? Thanks,

Alex


2024-02-06 22:21:46

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 07/17] vfio/pci: Preserve per-interrupt contexts

Hi Alex,

On 2/6/2024 2:03 PM, Alex Williamson wrote:
> On Tue, 6 Feb 2024 13:45:22 -0800
> Reinette Chatre <[email protected]> wrote:
>> On 2/5/2024 2:35 PM, Alex Williamson wrote:
>>> On Thu, 1 Feb 2024 20:57:01 -0800
>>> Reinette Chatre <[email protected]> wrote:
>>
>> ..
>>
>>>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>>>> index 31f73c70fcd2..7ca2b983b66e 100644
>>>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>>>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>>>> @@ -427,7 +427,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>>
>>>> ctx = vfio_irq_ctx_get(vdev, vector);
>>>>
>>>> - if (ctx) {
>>>> + if (ctx && ctx->trigger) {
>>>> irq_bypass_unregister_producer(&ctx->producer);
>>>> irq = pci_irq_vector(pdev, vector);
>>>> cmd = vfio_pci_memory_lock_and_enable(vdev);
>>>> @@ -435,8 +435,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>> vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>> /* Interrupt stays allocated, will be freed at MSI-X disable. */
>>>> kfree(ctx->name);
>>>> + ctx->name = NULL;
>>>
>>> Setting ctx->name = NULL is not strictly necessary and does not match
>>> the INTx code that we're claiming to try to emulate. ctx->name is only
>>> tested immediately after allocation below, otherwise it can be inferred
>>> from ctx->trigger. Thanks,
>>
>> This all matches my understanding. I added ctx->name = NULL after every kfree(ctx->name)
>> (see below for confirmation of other instance). You are correct that the flow
>> infers validity of ctx->name from ctx->trigger. My motivation for
>> adding ctx->name = NULL is that, since the interrupt context persists, this
>> change ensures that there will be no pointer that points to freed memory. I
>> am not comfortable leaving pointers to freed memory around.
>
> Fair enough. Maybe note the change in the commit log. Thanks,
>

Will do. Thank you.

Reinette


2024-02-06 22:22:31

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

Hi Alex,

On 2/6/2024 2:03 PM, Alex Williamson wrote:
> On Tue, 6 Feb 2024 13:46:37 -0800
> Reinette Chatre <[email protected]> wrote:
>
>> Hi Alex,
>>
>> On 2/5/2024 2:35 PM, Alex Williamson wrote:
>>> On Thu, 1 Feb 2024 20:57:09 -0800
>>> Reinette Chatre <[email protected]> wrote:
>>
>> ..
>>
>>>> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
>>>> if (is_intx(vdev))
>>>> return vfio_irq_set_block(vdev, start, count, fds, index);
>>>>
>>>> - ret = vfio_intx_enable(vdev);
>>>> + ret = vfio_intx_enable(vdev, start, count, index);
>>>
>>> Please trace what happens when a user calls SET_IRQS to setup a trigger
>>> eventfd with start = 0, count = 1, followed by any other combination of
>>> start and count values once is_intx() is true. vfio_intx_enable()
>>> cannot be the only place we bounds check the user, all of the INTx
>>> callbacks should be an error or nop if vector != 0. Thanks,
>>>
>>
>> Thank you very much for catching this. I plan to add the vector
>> check to the device_name() and request_interrupt() callbacks. I do
>> not think it is necessary to add the vector check to disable() since
>> it does not operate on a range and from what I can tell it depends on
>> a successful enable() that already contains the vector check. Similar,
>> free_interrupt() requires a successful request_interrupt() (that will
>> have vector check in next version).
>> send_eventfd() requires a valid interrupt context that is only
>> possible if enable() or request_interrupt() succeeded.
>
> Sounds reasonable.
>
>> If user space creates an eventfd with start = 0 and count = 1
>> and then attempts to trigger the eventfd using another combination then
>> the changes in this series will result in a nop while the current
>> implementation will result in -EINVAL. Is this acceptable?
>
> I think by nop, you mean the ioctl returns success. Was the call a
> success? Thanks,

Yes, I mean the ioctl returns success without taking any
action (nop).

It is not obvious to me how to interpret "success" because from what I
understand current INTx and MSI/MSI-x are behaving differently when
considering this flow. If I understand correctly, INTx will return
an error if user space attempts to trigger an eventfd that has not
been set up while MSI and MSI-x will return 0.

I can restore existing INTx behavior by adding more logic and a return
code to the send_eventfd() callback so that the different interrupt types
can maintain their existing behavior.

Reinette


2024-02-06 23:19:56

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

On Tue, 6 Feb 2024 14:22:04 -0800
Reinette Chatre <[email protected]> wrote:

> Hi Alex,
>
> On 2/6/2024 2:03 PM, Alex Williamson wrote:
> > On Tue, 6 Feb 2024 13:46:37 -0800
> > Reinette Chatre <[email protected]> wrote:
> >
> >> Hi Alex,
> >>
> >> On 2/5/2024 2:35 PM, Alex Williamson wrote:
> >>> On Thu, 1 Feb 2024 20:57:09 -0800
> >>> Reinette Chatre <[email protected]> wrote:
> >>
> >> ..
> >>
> >>>> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
> >>>> if (is_intx(vdev))
> >>>> return vfio_irq_set_block(vdev, start, count, fds, index);
> >>>>
> >>>> - ret = vfio_intx_enable(vdev);
> >>>> + ret = vfio_intx_enable(vdev, start, count, index);
> >>>
> >>> Please trace what happens when a user calls SET_IRQS to setup a trigger
> >>> eventfd with start = 0, count = 1, followed by any other combination of
> >>> start and count values once is_intx() is true. vfio_intx_enable()
> >>> cannot be the only place we bounds check the user, all of the INTx
> >>> callbacks should be an error or nop if vector != 0. Thanks,
> >>>
> >>
> >> Thank you very much for catching this. I plan to add the vector
> >> check to the device_name() and request_interrupt() callbacks. I do
> >> not think it is necessary to add the vector check to disable() since
> >> it does not operate on a range and from what I can tell it depends on
> >> a successful enable() that already contains the vector check. Similar,
> >> free_interrupt() requires a successful request_interrupt() (that will
> >> have vector check in next version).
> >> send_eventfd() requires a valid interrupt context that is only
> >> possible if enable() or request_interrupt() succeeded.
> >
> > Sounds reasonable.
> >
> >> If user space creates an eventfd with start = 0 and count = 1
> >> and then attempts to trigger the eventfd using another combination then
> >> the changes in this series will result in a nop while the current
> >> implementation will result in -EINVAL. Is this acceptable?
> >
> > I think by nop, you mean the ioctl returns success. Was the call a
> > success? Thanks,
>
> Yes, I mean the ioctl returns success without taking any
> action (nop).
>
> It is not obvious to me how to interpret "success" because from what I
> understand current INTx and MSI/MSI-x are behaving differently when
> considering this flow. If I understand correctly, INTx will return
> an error if user space attempts to trigger an eventfd that has not
> been set up while MSI and MSI-x will return 0.
>
> I can restore existing INTx behavior by adding more logic and a return
> code to the send_eventfd() callback so that the different interrupt types
> can maintain their existing behavior.

Ah yes, I see the dilemma now. INTx always checked start/count were
valid but MSI/X plowed through regardless, and with this series we've
standardized the loop around the MSI/X flow.

Tricky, but probably doesn't really matter. Unless we break someone.

I can ignore that INTx can be masked and signaling a masked vector
doesn't do anything, but signaling an unconfigured vector feels like an
error condition and trying to create verbiage in the uAPI header to
weasel out of that error and unconditionally return success makes me
cringe.

What if we did this:

uint8_t *bools = data;
...
for (i = start; i < start + count; i++) {
if ((flags & VFIO_IRQ_SET_DATA_NONE) ||
((flags & VFIO_IRQ_SET_DATA_BOOL) && bools[i - start])) {
ctx = vfio_irq_ctx_get(vdev, i);
if (!ctx || !ctx->trigger)
return -EINVAL;
intr_ops[index].send_eventfd(vdev, ctx);
}
}

And we note the behavior change for MSI/X in the commit log and if
someone shouts that we broke them, we can make that an -errno or
continue based on is_intx(). Sound ok? Thanks,

Alex


2024-02-07 23:30:40

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

Hi Alex,

On 2/6/2024 3:19 PM, Alex Williamson wrote:
> On Tue, 6 Feb 2024 14:22:04 -0800
> Reinette Chatre <[email protected]> wrote:
>> On 2/6/2024 2:03 PM, Alex Williamson wrote:
>>> On Tue, 6 Feb 2024 13:46:37 -0800
>>> Reinette Chatre <[email protected]> wrote:
>>>> On 2/5/2024 2:35 PM, Alex Williamson wrote:
>>>>> On Thu, 1 Feb 2024 20:57:09 -0800
>>>>> Reinette Chatre <[email protected]> wrote:
>>>>
>>>> ..
>>>>
>>>>>> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
>>>>>> if (is_intx(vdev))
>>>>>> return vfio_irq_set_block(vdev, start, count, fds, index);
>>>>>>
>>>>>> - ret = vfio_intx_enable(vdev);
>>>>>> + ret = vfio_intx_enable(vdev, start, count, index);
>>>>>
>>>>> Please trace what happens when a user calls SET_IRQS to setup a trigger
>>>>> eventfd with start = 0, count = 1, followed by any other combination of
>>>>> start and count values once is_intx() is true. vfio_intx_enable()
>>>>> cannot be the only place we bounds check the user, all of the INTx
>>>>> callbacks should be an error or nop if vector != 0. Thanks,
>>>>>
>>>>
>>>> Thank you very much for catching this. I plan to add the vector
>>>> check to the device_name() and request_interrupt() callbacks. I do
>>>> not think it is necessary to add the vector check to disable() since
>>>> it does not operate on a range and from what I can tell it depends on
>>>> a successful enable() that already contains the vector check. Similar,
>>>> free_interrupt() requires a successful request_interrupt() (that will
>>>> have vector check in next version).
>>>> send_eventfd() requires a valid interrupt context that is only
>>>> possible if enable() or request_interrupt() succeeded.
>>>
>>> Sounds reasonable.
>>>
>>>> If user space creates an eventfd with start = 0 and count = 1
>>>> and then attempts to trigger the eventfd using another combination then
>>>> the changes in this series will result in a nop while the current
>>>> implementation will result in -EINVAL. Is this acceptable?
>>>
>>> I think by nop, you mean the ioctl returns success. Was the call a
>>> success? Thanks,
>>
>> Yes, I mean the ioctl returns success without taking any
>> action (nop).
>>
>> It is not obvious to me how to interpret "success" because from what I
>> understand current INTx and MSI/MSI-x are behaving differently when
>> considering this flow. If I understand correctly, INTx will return
>> an error if user space attempts to trigger an eventfd that has not
>> been set up while MSI and MSI-x will return 0.
>>
>> I can restore existing INTx behavior by adding more logic and a return
>> code to the send_eventfd() callback so that the different interrupt types
>> can maintain their existing behavior.
>
> Ah yes, I see the dilemma now. INTx always checked start/count were
> valid but MSI/X plowed through regardless, and with this series we've
> standardized the loop around the MSI/X flow.
>
> Tricky, but probably doesn't really matter. Unless we break someone.
>
> I can ignore that INTx can be masked and signaling a masked vector
> doesn't do anything, but signaling an unconfigured vector feels like an
> error condition and trying to create verbiage in the uAPI header to
> weasel out of that error and unconditionally return success makes me
> cringe.
>
> What if we did this:
>
> uint8_t *bools = data;
> ...
> for (i = start; i < start + count; i++) {
> if ((flags & VFIO_IRQ_SET_DATA_NONE) ||
> ((flags & VFIO_IRQ_SET_DATA_BOOL) && bools[i - start])) {
> ctx = vfio_irq_ctx_get(vdev, i);
> if (!ctx || !ctx->trigger)
> return -EINVAL;
> intr_ops[index].send_eventfd(vdev, ctx);
> }
> }
>

This looks good. Thank you very much. Will do.

I studied the code more and have one more observation related to this portion
of the flow:
From what I can tell this change makes the INTx code more robust. If I
understand current implementation correctly it seems possible to enable
INTx but not have interrupt allocated. In this case the interrupt context
(ctx) will exist but ctx->trigger will be NULL. Current
vfio_pci_set_intx_trigger()->vfio_send_intx_eventfd() only checks if
ctx is valid. It looks like it may call eventfd_signal(NULL) where
pointer is dereferenced.

If this is correct then I think a separate fix that can easily be
backported may be needed. Something like:

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 237beac83809..17ec46d8ab29 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -92,7 +92,7 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
struct vfio_pci_irq_ctx *ctx;

ctx = vfio_irq_ctx_get(vdev, 0);
- if (WARN_ON_ONCE(!ctx))
+ if (WARN_ON_ONCE(!ctx || !ctx->trigger))
return;
eventfd_signal(ctx->trigger);
}

> And we note the behavior change for MSI/X in the commit log and if
> someone shouts that we broke them, we can make that an -errno or
> continue based on is_intx(). Sound ok? Thanks,

I'll be sure to highlight the impact on MSI/MSI-x. Please do expect this
in the final patch "vfio/pci: Remove duplicate interrupt management flow"
though since that is where the different flows are merged.

I am not familiar with how all user space interacts with this flow and if/how
this may break things. I did look at Qemu code and I was not able to find
where it intentionally triggers MSI/MSI-x interrupts, I could only find it
for INTx.

If this does break things I would like to also consider moving the
different behavior into the interrupt type's respective send_eventfd()
callback instead of adding interrupt type specific code (like
is_intx()) into the shared flow.

Thank you.

Reinette

2024-02-08 21:41:22

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

On Wed, 7 Feb 2024 15:30:15 -0800
Reinette Chatre <[email protected]> wrote:

> Hi Alex,
>
> On 2/6/2024 3:19 PM, Alex Williamson wrote:
> > On Tue, 6 Feb 2024 14:22:04 -0800
> > Reinette Chatre <[email protected]> wrote:
> >> On 2/6/2024 2:03 PM, Alex Williamson wrote:
> >>> On Tue, 6 Feb 2024 13:46:37 -0800
> >>> Reinette Chatre <[email protected]> wrote:
> >>>> On 2/5/2024 2:35 PM, Alex Williamson wrote:
> >>>>> On Thu, 1 Feb 2024 20:57:09 -0800
> >>>>> Reinette Chatre <[email protected]> wrote:
> >>>>
> >>>> ..
> >>>>
> >>>>>> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
> >>>>>> if (is_intx(vdev))
> >>>>>> return vfio_irq_set_block(vdev, start, count, fds, index);
> >>>>>>
> >>>>>> - ret = vfio_intx_enable(vdev);
> >>>>>> + ret = vfio_intx_enable(vdev, start, count, index);
> >>>>>
> >>>>> Please trace what happens when a user calls SET_IRQS to setup a trigger
> >>>>> eventfd with start = 0, count = 1, followed by any other combination of
> >>>>> start and count values once is_intx() is true. vfio_intx_enable()
> >>>>> cannot be the only place we bounds check the user, all of the INTx
> >>>>> callbacks should be an error or nop if vector != 0. Thanks,
> >>>>>
> >>>>
> >>>> Thank you very much for catching this. I plan to add the vector
> >>>> check to the device_name() and request_interrupt() callbacks. I do
> >>>> not think it is necessary to add the vector check to disable() since
> >>>> it does not operate on a range and from what I can tell it depends on
> >>>> a successful enable() that already contains the vector check. Similar,
> >>>> free_interrupt() requires a successful request_interrupt() (that will
> >>>> have vector check in next version).
> >>>> send_eventfd() requires a valid interrupt context that is only
> >>>> possible if enable() or request_interrupt() succeeded.
> >>>
> >>> Sounds reasonable.
> >>>
> >>>> If user space creates an eventfd with start = 0 and count = 1
> >>>> and then attempts to trigger the eventfd using another combination then
> >>>> the changes in this series will result in a nop while the current
> >>>> implementation will result in -EINVAL. Is this acceptable?
> >>>
> >>> I think by nop, you mean the ioctl returns success. Was the call a
> >>> success? Thanks,
> >>
> >> Yes, I mean the ioctl returns success without taking any
> >> action (nop).
> >>
> >> It is not obvious to me how to interpret "success" because from what I
> >> understand current INTx and MSI/MSI-x are behaving differently when
> >> considering this flow. If I understand correctly, INTx will return
> >> an error if user space attempts to trigger an eventfd that has not
> >> been set up while MSI and MSI-x will return 0.
> >>
> >> I can restore existing INTx behavior by adding more logic and a return
> >> code to the send_eventfd() callback so that the different interrupt types
> >> can maintain their existing behavior.
> >
> > Ah yes, I see the dilemma now. INTx always checked start/count were
> > valid but MSI/X plowed through regardless, and with this series we've
> > standardized the loop around the MSI/X flow.
> >
> > Tricky, but probably doesn't really matter. Unless we break someone.
> >
> > I can ignore that INTx can be masked and signaling a masked vector
> > doesn't do anything, but signaling an unconfigured vector feels like an
> > error condition and trying to create verbiage in the uAPI header to
> > weasel out of that error and unconditionally return success makes me
> > cringe.
> >
> > What if we did this:
> >
> > uint8_t *bools = data;
> > ...
> > for (i = start; i < start + count; i++) {
> > if ((flags & VFIO_IRQ_SET_DATA_NONE) ||
> > ((flags & VFIO_IRQ_SET_DATA_BOOL) && bools[i - start])) {
> > ctx = vfio_irq_ctx_get(vdev, i);
> > if (!ctx || !ctx->trigger)
> > return -EINVAL;
> > intr_ops[index].send_eventfd(vdev, ctx);
> > }
> > }
> >
>
> This looks good. Thank you very much. Will do.
>
> I studied the code more and have one more observation related to this portion
> of the flow:
> From what I can tell this change makes the INTx code more robust. If I
> understand current implementation correctly it seems possible to enable
> INTx but not have interrupt allocated. In this case the interrupt context
> (ctx) will exist but ctx->trigger will be NULL. Current
> vfio_pci_set_intx_trigger()->vfio_send_intx_eventfd() only checks if
> ctx is valid. It looks like it may call eventfd_signal(NULL) where
> pointer is dereferenced.
>
> If this is correct then I think a separate fix that can easily be
> backported may be needed. Something like:

Good find. I think it's a bit more complicated though. There are
several paths to vfio_send_intx_eventfd:

- vfio_intx_handler

This can only be called between request_irq() and free_irq()
where trigger is always valid. igate is not held.

- vfio_pci_intx_unmask

Callers hold igate, additional test of ctx->trigger makes this
safe.

- vfio_pci_set_intx_trigger

Same as above.

- Through unmask eventfd (virqfd)

Here be dragons.

In the virqfd case, a write to the eventfd calls virqfd_wakeup() where
we'll call the handler, vfio_pci_intx_unmask_handler(), and based on
the result schedule the thread, vfio_send_intx_eventfd(). Both of
these look suspicious. They're not called under igate, so testing
ctx->trigger doesn't resolve the race.

I think an option is to wrap the virqfd entry points in igate where we
can then do something similar to your suggestion. I don't think we
want to WARN_ON(!ctx->trigger) because that's then a user reachable
condition. Instead we can just quietly follow the same exit paths.

I think that means we end up with something like this:

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 237beac83809..ace7e1dbc607 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -92,12 +92,21 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
struct vfio_pci_irq_ctx *ctx;

ctx = vfio_irq_ctx_get(vdev, 0);
- if (WARN_ON_ONCE(!ctx))
+ if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
return;
eventfd_signal(ctx->trigger);
}
}

+static void vfio_send_intx_eventfd_virqfd(void *opaque, void *unused)
+{
+ struct vfio_pci_core_device *vdev = opaque;
+
+ mutex_lock(&vdev->igate);
+ vfio_send_intx_eventfd(opaque, unused);
+ mutex_unlock(&vdev->igate);
+}
+
/* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
{
@@ -170,7 +179,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
}

ctx = vfio_irq_ctx_get(vdev, 0);
- if (WARN_ON_ONCE(!ctx))
+ if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
goto out_unlock;

if (ctx->masked && !vdev->virq_disabled) {
@@ -194,6 +203,18 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
return ret;
}

+static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
+{
+ struct vfio_pci_core_device *vdev = opaque;
+ int ret;
+
+ mutex_lock(&vdev->igate);
+ ret = vfio_pci_intx_unmask_handler(opaque, unused);
+ mutex_unlock(&vdev->igate);
+
+ return ret;
+}
+
void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
{
if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
@@ -572,10 +593,10 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
if (WARN_ON_ONCE(!ctx))
return -EINVAL;
if (fd >= 0)
- return vfio_virqfd_enable((void *) vdev,
- vfio_pci_intx_unmask_handler,
- vfio_send_intx_eventfd, NULL,
- &ctx->unmask, fd);
+ return vfio_virqfd_enable((void *)vdev,
+ vfio_pci_intx_unmask_handler_virqfd,
+ vfio_send_intx_eventfd_virqfd, NULL,
+ &ctx->unmask, fd);

vfio_virqfd_disable(&ctx->unmask);
}


WDYT?

> > And we note the behavior change for MSI/X in the commit log and if
> > someone shouts that we broke them, we can make that an -errno or
> > continue based on is_intx(). Sound ok? Thanks,
>
> I'll be sure to highlight the impact on MSI/MSI-x. Please do expect this
> in the final patch "vfio/pci: Remove duplicate interrupt management flow"
> though since that is where the different flows are merged.
>
> I am not familiar with how all user space interacts with this flow and if/how
> this may break things. I did look at Qemu code and I was not able to find
> where it intentionally triggers MSI/MSI-x interrupts, I could only find it
> for INTx.

Being able to trigger the interrupt via ioctl is more of a diagnostic
feature, not typically used in production.

> If this does break things I would like to also consider moving the
> different behavior into the interrupt type's respective send_eventfd()
> callback instead of adding interrupt type specific code (like
> is_intx()) into the shared flow.

Sure, we can pick the best option in the slim (imo) chance the change
affects anyone. Thanks,

Alex


2024-02-14 20:14:40

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

Hi Alex,

Apologies for the delay. This was due to two parts:
* I studied these code paths more and I think there may be one more
flow that can be made more robust (more later),
* I spent a lot of time trying to trigger all the affected code paths but
here I did not have much luck. I would prefer to run tests that trigger
these flows so that I can get some help from the kernel lock debugging.
From what I can tell the irqfd flows can be triggered with the help of the
kernel-irqchip option to Qemu but on the hardware I have access to I
could only try kernel-irqchip=auto and that did not trigger the flows
(the irqfd is set up but the eventfd is never signaled).
I am not familiar with this area, do you perhaps have guidance on how
I can test these flows or do you perhaps have access to needed environments
to test this?

On 2/8/2024 1:08 PM, Alex Williamson wrote:
> On Wed, 7 Feb 2024 15:30:15 -0800
> Reinette Chatre <[email protected]> wrote:

>> I studied the code more and have one more observation related to this portion
>> of the flow:
>> From what I can tell this change makes the INTx code more robust. If I
>> understand current implementation correctly it seems possible to enable
>> INTx but not have interrupt allocated. In this case the interrupt context
>> (ctx) will exist but ctx->trigger will be NULL. Current
>> vfio_pci_set_intx_trigger()->vfio_send_intx_eventfd() only checks if
>> ctx is valid. It looks like it may call eventfd_signal(NULL) where
>> pointer is dereferenced.
>>
>> If this is correct then I think a separate fix that can easily be
>> backported may be needed. Something like:
>
> Good find. I think it's a bit more complicated though. There are
> several paths to vfio_send_intx_eventfd:
>
> - vfio_intx_handler
>
> This can only be called between request_irq() and free_irq()
> where trigger is always valid. igate is not held.
>
> - vfio_pci_intx_unmask
>
> Callers hold igate, additional test of ctx->trigger makes this
> safe.

Two callers of vfio_pci_intx_unmask() do not seem to hold igate:
vfio_basic_config_write() and vfio_pci_core_runtime_resume().

Considering this I wonder if we could add something like below to the
solution you propose. On a high level the outside callers (VFIO PCI core)
will keep using vfio_pci_intx_unmask() that will now take igate while
the interrupt management code gets a new internal __vfio_pci_intx_unmask()
that should be called with igate held. This results in:

@@ -215,12 +223,20 @@ static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
return ret;
}

-void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
+static void __vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
{
+ lockdep_assert_held(&vdev->igate);
if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
vfio_send_intx_eventfd(vdev, NULL);
}

+void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
+{
+ mutex_lock(&vdev->igate);
+ __vfio_pci_intx_unmask(vdev);
+ mutex_unlock(&vdev->igate);
+}
+
static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
{
struct vfio_pci_core_device *vdev = dev_id;
@@ -581,11 +597,11 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
return -EINVAL;

if (flags & VFIO_IRQ_SET_DATA_NONE) {
- vfio_pci_intx_unmask(vdev);
+ __vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
uint8_t unmask = *(uint8_t *)data;
if (unmask)
- vfio_pci_intx_unmask(vdev);
+ __vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
int32_t fd = *(int32_t *)data;

>
> - vfio_pci_set_intx_trigger
>
> Same as above.
>
> - Through unmask eventfd (virqfd)
>
> Here be dragons.
>
> In the virqfd case, a write to the eventfd calls virqfd_wakeup() where
> we'll call the handler, vfio_pci_intx_unmask_handler(), and based on
> the result schedule the thread, vfio_send_intx_eventfd(). Both of
> these look suspicious. They're not called under igate, so testing
> ctx->trigger doesn't resolve the race.
>
> I think an option is to wrap the virqfd entry points in igate where we
> can then do something similar to your suggestion. I don't think we
> want to WARN_ON(!ctx->trigger) because that's then a user reachable
> condition. Instead we can just quietly follow the same exit paths.
>
> I think that means we end up with something like this:
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 237beac83809..ace7e1dbc607 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -92,12 +92,21 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
> struct vfio_pci_irq_ctx *ctx;
>
> ctx = vfio_irq_ctx_get(vdev, 0);
> - if (WARN_ON_ONCE(!ctx))
> + if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
> return;
> eventfd_signal(ctx->trigger);
> }
> }
>
> +static void vfio_send_intx_eventfd_virqfd(void *opaque, void *unused)
> +{
> + struct vfio_pci_core_device *vdev = opaque;
> +
> + mutex_lock(&vdev->igate);
> + vfio_send_intx_eventfd(opaque, unused);
> + mutex_unlock(&vdev->igate);
> +}
> +
> /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
> bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
> {
> @@ -170,7 +179,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
> }
>
> ctx = vfio_irq_ctx_get(vdev, 0);
> - if (WARN_ON_ONCE(!ctx))
> + if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
> goto out_unlock;
>
> if (ctx->masked && !vdev->virq_disabled) {
> @@ -194,6 +203,18 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
> return ret;
> }
>
> +static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
> +{
> + struct vfio_pci_core_device *vdev = opaque;
> + int ret;
> +
> + mutex_lock(&vdev->igate);
> + ret = vfio_pci_intx_unmask_handler(opaque, unused);
> + mutex_unlock(&vdev->igate);
> +
> + return ret;
> +}
> +
> void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
> {
> if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
> @@ -572,10 +593,10 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
> if (WARN_ON_ONCE(!ctx))
> return -EINVAL;
> if (fd >= 0)
> - return vfio_virqfd_enable((void *) vdev,
> - vfio_pci_intx_unmask_handler,
> - vfio_send_intx_eventfd, NULL,
> - &ctx->unmask, fd);
> + return vfio_virqfd_enable((void *)vdev,
> + vfio_pci_intx_unmask_handler_virqfd,
> + vfio_send_intx_eventfd_virqfd, NULL,
> + &ctx->unmask, fd);
>
> vfio_virqfd_disable(&ctx->unmask);
> }
>
>
> WDYT?

This looks good to me. Thank you very much for taking the time to
write it.

Reinette