2019-09-13 02:00:19

by Megha Dey

[permalink] [raw]
Subject: [RFC V1 0/7] Add support for a new IMS interrupt mechanism

Currently, MSI (Message signaled interrupts) and MSI-X are the de facto
standard for device interrupt mechanism. MSI-X supports up to 2048
interrupts per device while MSI supports 32, which seems more than enough
for current devices. However, the introduction of SIOV (Scalable IO
virtualization) shifts the creation of assignable virtual devices from
hardware to a more software assisted approach. This flexible composition
of direct assignable devices, a.k.a. assignable device interfaces (ADIs)
unchains hardware from costly PCI standard. Under SIOV, device resource
can now be mapped directly to a guest or other user space drivers for
near native DMA performance. To complete functionality of ADIs, a matching
interrupt resource must also be introduced which will be scalable.

Interrupt message storage (IMS) is conceived as a scalable albeit device
specific interrupt mechanism to meet such a demand. With IMS, there is
theoretically no upper bound on the number of interrupts which a device
can support. The size and location of IMS is device-specific; some devices
may implement IMS as on-device storage which are memory-mapped, others may
opt to implement IMS in system memory. IMS stores each interrupt message as
a DWORD size data payload and a 64-bit address(same as MSI-X). Access to
the IMS is through the host driver due to the non-architectural nature of
device IMS unlike the architectural MSI-X table which are accessed through
PCI drivers.

In this patchset, we introduce generic IMS APIs that fits the Linux IRQ
subsystem, supports IMS IRQ chip and domains that can be used by drivers
which are capable of generating IMS interrupts.

The IMS has been introduced as part of Intel's Scalable I/O virtualization
specification:
https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification

This patchset is based on Linux 5.3-rc8.

Currently there is no device out in the market which supports SIOV (Hence no
device supports IMS).

This series is a basic patchset to get the ball rolling and receive some
inital comments. As per my discussion with Marc Zyngier and Thomas Gleixner
at the Linux Plumbers, I need to do the following:
1. Since a device can support MSI-X and IMS simultaneously, ensure proper
locking mechanism for the 'msi_list' in the device structure.
2. Introduce dynamic allocation of IMS vectors perhaps by using a group ID
3. IMS support of a device needs to be discoverable. A bit in the vendor
specific capability in the PCI config is to be added rather than getting
this information from each device driver.

Jason Gunthorpe of Mellanox technologies is looking to do something similar
on ARM platforms and was wondering why IMS is x86 sepcific. Perhaps we can
use this thread to discuss further on this.

Megha Dey (7):
genirq/msi: Differentiate between various MSI based interrupts
drivers/base: Introduce callbacks for IMS interrupt domain
x86/ims: Add support for a new IMS irq domain
irq_remapping: New interfaces to support IMS irqdomain
x86/ims: Introduce x86_ims_ops
ims-msi: Add APIs to allocate/free IMS interrupts
ims: Add the set_desc callback

arch/mips/pci/msi-xlp.c | 2 +-
arch/s390/pci/pci_irq.c | 2 +-
arch/x86/include/asm/irq_remapping.h | 13 ++
arch/x86/include/asm/msi.h | 4 +
arch/x86/include/asm/pci.h | 4 +
arch/x86/include/asm/x86_init.h | 10 +
arch/x86/kernel/apic/Makefile | 1 +
arch/x86/kernel/apic/ims.c | 118 ++++++++++++
arch/x86/kernel/apic/msi.c | 6 +-
arch/x86/kernel/x86_init.c | 23 +++
arch/x86/pci/xen.c | 2 +-
drivers/base/Kconfig | 7 +
drivers/base/Makefile | 1 +
drivers/base/ims-msi.c | 353 +++++++++++++++++++++++++++++++++++
drivers/iommu/intel_irq_remapping.c | 30 +++
drivers/iommu/irq_remapping.c | 9 +
drivers/iommu/irq_remapping.h | 3 +
drivers/pci/msi.c | 19 +-
drivers/vfio/mdev/mdev_core.c | 6 +
drivers/vfio/mdev/mdev_private.h | 1 -
include/linux/intel-iommu.h | 1 +
include/linux/mdev.h | 2 +
include/linux/msi.h | 55 +++++-
kernel/irq/msi.c | 2 +-
24 files changed, 655 insertions(+), 19 deletions(-)
create mode 100644 arch/x86/kernel/apic/ims.c
create mode 100644 drivers/base/ims-msi.c

--
2.7.4


2019-09-13 02:25:08

by Megha Dey

[permalink] [raw]
Subject: [RFC V1 4/7] irq_remapping: New interfaces to support IMS irqdomain

Introduce new interfaces for interrupt remapping drivers to support
IMS irqdomains:

irq_remapping_get_ims_irq_domain(): get the IMS irqdomain for an IRQ
allocation. We must build one IMS irqdomain for each interrupt remapping
unit. The driver calls this interface to get the IMS irqdomain associated
with an IR irqdomain which manages the devices.

Architecture specific hooks:
arch_create_ims_irq_domain(): create an IMS irqdomain associated with the
interrupt remapping unit.

We also add following callback into struct irq_remap_ops:
struct irq_domain *(*get_ims_irq_domain)(struct irq_alloc_info *);

Cc: Jacob Pan <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Megha Dey <[email protected]>
---
arch/x86/include/asm/irq_remapping.h | 13 +++++++++++++
drivers/iommu/intel_irq_remapping.c | 30 ++++++++++++++++++++++++++++++
drivers/iommu/irq_remapping.c | 9 +++++++++
drivers/iommu/irq_remapping.h | 3 +++
include/linux/intel-iommu.h | 1 +
5 files changed, 56 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 4bc985f..a735507 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -48,11 +48,18 @@ extern struct irq_domain *
irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info);
extern struct irq_domain *
irq_remapping_get_irq_domain(struct irq_alloc_info *info);
+extern struct irq_domain *
+irq_remapping_get_ims_irq_domain(struct irq_alloc_info *info);

/* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
extern struct irq_domain *
arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);

+/* Create IMS irqdomain, use @parent as the parent irqdomain. */
+#ifdef CONFIG_MSI_IMS
+extern struct irq_domain *arch_create_ims_irq_domain(struct irq_domain *parent);
+#endif
+
/* Get parent irqdomain for interrupt remapping irqdomain */
static inline struct irq_domain *arch_get_ir_parent_domain(void)
{
@@ -85,5 +92,11 @@ irq_remapping_get_irq_domain(struct irq_alloc_info *info)
return NULL;
}

+static inline struct irq_domain *
+irq_remapping_get_ims_irq_domain(struct irq_alloc_info *info)
+{
+ return NULL;
+}
+
#endif /* CONFIG_IRQ_REMAP */
#endif /* __X86_IRQ_REMAPPING_H */
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 4786ca0..3c0c0cb 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -573,6 +573,10 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
"INTEL-IR-MSI",
iommu->seq_id);

+#ifdef CONFIG_MSI_IMS
+ iommu->ir_ims_domain = arch_create_ims_irq_domain(iommu->ir_domain);
+#endif
+
ir_table->base = page_address(pages);
ir_table->bitmap = bitmap;
iommu->ir_table = ir_table;
@@ -633,6 +637,10 @@ static void intel_teardown_irq_remapping(struct intel_iommu *iommu)
irq_domain_remove(iommu->ir_msi_domain);
iommu->ir_msi_domain = NULL;
}
+ if (iommu->ir_ims_domain) {
+ irq_domain_remove(iommu->ir_ims_domain);
+ iommu->ir_ims_domain = NULL;
+ }
if (iommu->ir_domain) {
irq_domain_remove(iommu->ir_domain);
iommu->ir_domain = NULL;
@@ -1139,6 +1147,27 @@ static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
return NULL;
}

+static struct irq_domain *intel_get_ims_irq_domain(struct irq_alloc_info *info)
+{
+ struct intel_iommu *iommu;
+
+ if (!info)
+ return NULL;
+
+ switch (info->type) {
+ case X86_IRQ_ALLOC_TYPE_MSI:
+ case X86_IRQ_ALLOC_TYPE_MSIX:
+ iommu = map_dev_to_ir(info->msi_dev);
+ if (iommu)
+ return iommu->ir_ims_domain;
+ break;
+ default:
+ break;
+ }
+
+ return NULL;
+}
+
struct irq_remap_ops intel_irq_remap_ops = {
.prepare = intel_prepare_irq_remapping,
.enable = intel_enable_irq_remapping,
@@ -1147,6 +1176,7 @@ struct irq_remap_ops intel_irq_remap_ops = {
.enable_faulting = enable_drhd_fault_handling,
.get_ir_irq_domain = intel_get_ir_irq_domain,
.get_irq_domain = intel_get_irq_domain,
+ .get_ims_irq_domain = intel_get_ims_irq_domain,
};

static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 83f36f6..c4352fc 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -193,3 +193,12 @@ irq_remapping_get_irq_domain(struct irq_alloc_info *info)

return remap_ops->get_irq_domain(info);
}
+
+struct irq_domain *
+irq_remapping_get_ims_irq_domain(struct irq_alloc_info *info)
+{
+ if (!remap_ops || !remap_ops->get_ims_irq_domain)
+ return NULL;
+
+ return remap_ops->get_ims_irq_domain(info);
+}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 6a190d5..8e198ad 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -48,6 +48,9 @@ struct irq_remap_ops {

/* Get the MSI irqdomain associated with the IOMMU device */
struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
+
+ /* Get the IMS irqdomain associated with the IOMMU device */
+ struct irq_domain *(*get_ims_irq_domain)(struct irq_alloc_info *);
};

extern struct irq_remap_ops intel_irq_remap_ops;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4fc6454f..26be769 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -546,6 +546,7 @@ struct intel_iommu {
struct ir_table *ir_table; /* Interrupt remapping info */
struct irq_domain *ir_domain;
struct irq_domain *ir_msi_domain;
+ struct irq_domain *ir_ims_domain;
#endif
struct iommu_device iommu; /* IOMMU core code handle */
int node;
--
2.7.4

2019-09-13 03:01:00

by Megha Dey

[permalink] [raw]
Subject: [RFC V1 3/7] x86/ims: Add support for a new IMS irq domain

This patch adds support for the creation of a new IMS irq domain. It
creates a new irq_chip associated with the IMS domain and adds the
necessary domain operations to it.

Cc: Jacob Pan <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Megha Dey <[email protected]>
---
arch/x86/include/asm/msi.h | 4 ++
arch/x86/kernel/apic/Makefile | 1 +
arch/x86/kernel/apic/ims.c | 93 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/apic/msi.c | 4 +-
drivers/vfio/mdev/mdev_core.c | 6 +++
drivers/vfio/mdev/mdev_private.h | 1 -
include/linux/mdev.h | 2 +
7 files changed, 108 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/kernel/apic/ims.c

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index 25ddd09..51f9d25 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -11,4 +11,8 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,

void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc);

+struct msi_domain_info;
+
+irq_hw_number_t msi_get_hwirq(struct msi_domain_info *info,
+ msi_alloc_info_t *arg);
#endif /* _ASM_X86_MSI_H */
diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile
index a6fcaf16..75a2270 100644
--- a/arch/x86/kernel/apic/Makefile
+++ b/arch/x86/kernel/apic/Makefile
@@ -12,6 +12,7 @@ obj-y += hw_nmi.o

obj-$(CONFIG_X86_IO_APIC) += io_apic.o
obj-$(CONFIG_PCI_MSI) += msi.o
+obj-$(CONFIG_MSI_IMS) += ims.o
obj-$(CONFIG_SMP) += ipi.o

ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/apic/ims.c b/arch/x86/kernel/apic/ims.c
new file mode 100644
index 0000000..d9808a5
--- /dev/null
+++ b/arch/x86/kernel/apic/ims.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright © 2019 Intel Corporation.
+ *
+ * Author: Megha Dey <[email protected]>
+ */
+
+#include <linux/dmar.h>
+#include <linux/irq.h>
+#include <linux/mdev.h>
+#include <linux/pci.h>
+
+/*
+ * Determine if a dev is mdev or not. Return NULL if not mdev device.
+ * Return mdev's parent dev if success.
+ */
+static inline struct device *mdev_to_parent(struct device *dev)
+{
+ struct device *ret = NULL;
+ struct device *(*fn)(struct device *dev);
+ struct bus_type *bus = symbol_get(mdev_bus_type);
+
+ if (bus && dev->bus == bus) {
+ fn = symbol_get(mdev_dev_to_parent_dev);
+ ret = fn(dev);
+ symbol_put(mdev_dev_to_parent_dev);
+ symbol_put(mdev_bus_type);
+ }
+
+ return ret;
+}
+
+static struct pci_dev *ims_get_pci_dev(struct device *dev)
+{
+ struct pci_dev *pdev;
+
+ if (dev_is_mdev(dev)) {
+ struct device *parent = mdev_to_parent(dev);
+
+ pdev = to_pci_dev(parent);
+ } else {
+ pdev = to_pci_dev(dev);
+ }
+
+ return pdev;
+}
+
+int dev_ims_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+ msi_alloc_info_t *arg)
+{
+ struct pci_dev *pdev = ims_get_pci_dev(dev);
+
+ init_irq_alloc_info(arg, NULL);
+ arg->msi_dev = pdev;
+ arg->type = X86_IRQ_ALLOC_TYPE_MSIX;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(dev_ims_prepare);
+
+#ifdef CONFIG_IRQ_REMAP
+
+static struct msi_domain_ops dev_ims_domain_ops = {
+ .get_hwirq = msi_get_hwirq,
+ .msi_prepare = dev_ims_prepare,
+};
+
+static struct irq_chip dev_ims_ir_controller = {
+ .name = "IR-DEV-IMS",
+ .irq_unmask = dev_ims_unmask_irq,
+ .irq_mask = dev_ims_mask_irq,
+ .irq_ack = irq_chip_ack_parent,
+ .irq_retrigger = irq_chip_retrigger_hierarchy,
+ .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent,
+ .flags = IRQCHIP_SKIP_SET_WAKE,
+ .irq_write_msi_msg = dev_ims_write_msg,
+};
+
+static struct msi_domain_info ims_ir_domain_info = {
+ .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+ MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
+ .ops = &dev_ims_domain_ops,
+ .chip = &dev_ims_ir_controller,
+ .handler = handle_edge_irq,
+ .handler_name = "edge",
+};
+
+struct irq_domain *arch_create_ims_irq_domain(struct irq_domain *parent)
+{
+ return pci_msi_create_irq_domain(NULL, &ims_ir_domain_info, parent);
+}
+
+#endif
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 435bcda..65da813 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -84,7 +84,7 @@ void native_teardown_msi_irq(unsigned int irq)
irq_domain_free_irqs(irq, 1);
}

-static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
+irq_hw_number_t msi_get_hwirq(struct msi_domain_info *info,
msi_alloc_info_t *arg)
{
return arg->msi_hwirq;
@@ -116,7 +116,7 @@ void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
EXPORT_SYMBOL_GPL(pci_msi_set_desc);

static struct msi_domain_ops pci_msi_domain_ops = {
- .get_hwirq = pci_msi_get_hwirq,
+ .get_hwirq = msi_get_hwirq,
.msi_prepare = pci_msi_prepare,
.set_desc = pci_msi_set_desc,
};
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4c..cecc6a6 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -33,6 +33,12 @@ struct device *mdev_parent_dev(struct mdev_device *mdev)
}
EXPORT_SYMBOL(mdev_parent_dev);

+struct device *mdev_dev_to_parent_dev(struct device *dev)
+{
+ return to_mdev_device(dev)->parent->dev;
+}
+EXPORT_SYMBOL(mdev_dev_to_parent_dev);
+
void *mdev_get_drvdata(struct mdev_device *mdev)
{
return mdev->driver_data;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 7d92295..c21f130 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -36,7 +36,6 @@ struct mdev_device {
};

#define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
-#define dev_is_mdev(d) ((d)->bus == &mdev_bus_type)

struct mdev_type {
struct kobject kobj;
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca..9dcbffe 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -144,5 +144,7 @@ void mdev_unregister_driver(struct mdev_driver *drv);
struct device *mdev_parent_dev(struct mdev_device *mdev);
struct device *mdev_dev(struct mdev_device *mdev);
struct mdev_device *mdev_from_dev(struct device *dev);
+struct device *mdev_dev_to_parent_dev(struct device *dev);

+#define dev_is_mdev(d) ((d)->bus == symbol_get(mdev_bus_type))
#endif /* MDEV_H */
--
2.7.4

2019-09-13 16:18:47

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC V1 3/7] x86/ims: Add support for a new IMS irq domain

On Thu, Sep 12, 2019 at 06:32:04PM -0700, Megha Dey wrote:
> +/*
> + * Determine if a dev is mdev or not. Return NULL if not mdev device.
> + * Return mdev's parent dev if success.
> + */
> +static inline struct device *mdev_to_parent(struct device *dev)
> +{
> + struct device *ret = NULL;
> + struct device *(*fn)(struct device *dev);
> + struct bus_type *bus = symbol_get(mdev_bus_type);
> +
> + if (bus && dev->bus == bus) {
> + fn = symbol_get(mdev_dev_to_parent_dev);
> + ret = fn(dev);
> + symbol_put(mdev_dev_to_parent_dev);
> + symbol_put(mdev_bus_type);
> + }

Yuk, arch code should not have special knowledge about vfio-mdev, and
in any case the above way to get it is really gross.

> +static struct msi_domain_ops dev_ims_domain_ops = {
> + .get_hwirq = msi_get_hwirq,
> + .msi_prepare = dev_ims_prepare,
> +};
> +
> +static struct irq_chip dev_ims_ir_controller = {
> + .name = "IR-DEV-IMS",
> + .irq_unmask = dev_ims_unmask_irq,
> + .irq_mask = dev_ims_mask_irq,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent,
> + .flags = IRQCHIP_SKIP_SET_WAKE,
> + .irq_write_msi_msg = dev_ims_write_msg,
> +};

It seems really weird to wrapper an irq_chip like this, if the caller
is supposed to provide the functions more or less directly why not
have the caller create and own the irq chip? Is something in the way
of this? It looked like the platform version of this did the same API approach..

Jason

2019-09-13 20:31:04

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC V1 0/7] Add support for a new IMS interrupt mechanism

On Thu, Sep 12, 2019 at 06:32:01PM -0700, Megha Dey wrote:

> This series is a basic patchset to get the ball rolling and receive some
> inital comments. As per my discussion with Marc Zyngier and Thomas Gleixner
> at the Linux Plumbers, I need to do the following:
> 1. Since a device can support MSI-X and IMS simultaneously, ensure proper
> locking mechanism for the 'msi_list' in the device structure.
> 2. Introduce dynamic allocation of IMS vectors perhaps by using a group ID
> 3. IMS support of a device needs to be discoverable. A bit in the vendor
> specific capability in the PCI config is to be added rather than getting
> this information from each device driver.

Why #3? The point of this scheme is to delegate programming the
addr/data pairs to the driver so it can be done in some
device-specific way. There is no PCI standard behind this, and no
change in PCI semantics.

I think it would be a tall ask to get a config space bit from PCI-SIG
for something that has little to do with PCI.

After seeing that we already have a platform device based version of
this same idea (drivers/base/platform-msi.c), I think the task here is
really just to extend and expand that approach to work generically for
platform and PCI devices. Along the way tidying the arch interface so
x86 and ARM's stuff to support that uses the same generic interfaces.

ie it is re-organizing code and ideas already in Linux, not defining
some new standard.

I also think refering to this existing idea by some new IMS name is
only confusing people what the goal is... Which is perhaps why #3 was
suggested??

Stated more clearly, I think all uses would be satisfied if
platform_msi_domain_alloc_irqs() could be called for struct
pci_device, could be called multiple times for the same struct
pci_device, and co-existed with MSI and MSI-X on the same pci_device.

Jason

2019-09-13 20:52:06

by Ashok Raj

[permalink] [raw]
Subject: Re: [RFC V1 0/7] Add support for a new IMS interrupt mechanism

On Fri, Sep 13, 2019 at 07:50:50PM +0000, Jason Gunthorpe wrote:
> On Thu, Sep 12, 2019 at 06:32:01PM -0700, Megha Dey wrote:
>
> > This series is a basic patchset to get the ball rolling and receive some
> > inital comments. As per my discussion with Marc Zyngier and Thomas Gleixner
> > at the Linux Plumbers, I need to do the following:
> > 1. Since a device can support MSI-X and IMS simultaneously, ensure proper
> > locking mechanism for the 'msi_list' in the device structure.
> > 2. Introduce dynamic allocation of IMS vectors perhaps by using a group ID
> > 3. IMS support of a device needs to be discoverable. A bit in the vendor
> > specific capability in the PCI config is to be added rather than getting
> > this information from each device driver.
>
> Why #3? The point of this scheme is to delegate programming the
> addr/data pairs to the driver so it can be done in some
> device-specific way. There is no PCI standard behind this, and no
> change in PCI semantics.
>
> I think it would be a tall ask to get a config space bit from PCI-SIG
> for something that has little to do with PCI.

This isn't a standard config capability. Its Designated Vendor Specific
Capability (DVSEC). The device is responsible for managing the addr-data
pair. This provides a hint to the OS framework that this device has
device specific methods.

Agreed its not required, but some OSV's like a generic way to discover
these capabilities, hence its there so device vendors can have
a common guideline.

Check here for some of those details:

https://software.intel.com/en-us/blogs/2018/06/25/introducing-intel-scalable-io-virtualization

>
> After seeing that we already have a platform device based version of
> this same idea (drivers/base/platform-msi.c), I think the task here is
> really just to extend and expand that approach to work generically for
> platform and PCI devices. Along the way tidying the arch interface so
> x86 and ARM's stuff to support that uses the same generic interfaces.
>
> ie it is re-organizing code and ideas already in Linux, not defining
> some new standard.
>
> I also think refering to this existing idea by some new IMS name is
> only confusing people what the goal is... Which is perhaps why #3 was
> suggested??
>
> Stated more clearly, I think all uses would be satisfied if
> platform_msi_domain_alloc_irqs() could be called for struct
> pci_device, could be called multiple times for the same struct
> pci_device, and co-existed with MSI and MSI-X on the same pci_device.
>
> Jason

2019-09-13 22:10:08

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC V1 3/7] x86/ims: Add support for a new IMS irq domain

On Thu, 12 Sep 2019 18:32:04 -0700
Megha Dey <[email protected]> wrote:

> This patch adds support for the creation of a new IMS irq domain. It
> creates a new irq_chip associated with the IMS domain and adds the
> necessary domain operations to it.
>
> Cc: Jacob Pan <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Megha Dey <[email protected]>
> ---
> arch/x86/include/asm/msi.h | 4 ++
> arch/x86/kernel/apic/Makefile | 1 +
> arch/x86/kernel/apic/ims.c | 93 ++++++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/apic/msi.c | 4 +-
> drivers/vfio/mdev/mdev_core.c | 6 +++
> drivers/vfio/mdev/mdev_private.h | 1 -
> include/linux/mdev.h | 2 +
> 7 files changed, 108 insertions(+), 3 deletions(-)
> create mode 100644 arch/x86/kernel/apic/ims.c
>
> diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
> index 25ddd09..51f9d25 100644
> --- a/arch/x86/include/asm/msi.h
> +++ b/arch/x86/include/asm/msi.h
> @@ -11,4 +11,8 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
>
> void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc);
>
> +struct msi_domain_info;
> +
> +irq_hw_number_t msi_get_hwirq(struct msi_domain_info *info,
> + msi_alloc_info_t *arg);
> #endif /* _ASM_X86_MSI_H */
> diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile
> index a6fcaf16..75a2270 100644
> --- a/arch/x86/kernel/apic/Makefile
> +++ b/arch/x86/kernel/apic/Makefile
> @@ -12,6 +12,7 @@ obj-y += hw_nmi.o
>
> obj-$(CONFIG_X86_IO_APIC) += io_apic.o
> obj-$(CONFIG_PCI_MSI) += msi.o
> +obj-$(CONFIG_MSI_IMS) += ims.o
> obj-$(CONFIG_SMP) += ipi.o
>
> ifeq ($(CONFIG_X86_64),y)
> diff --git a/arch/x86/kernel/apic/ims.c b/arch/x86/kernel/apic/ims.c
> new file mode 100644
> index 0000000..d9808a5
> --- /dev/null
> +++ b/arch/x86/kernel/apic/ims.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2019 Intel Corporation.
> + *
> + * Author: Megha Dey <[email protected]>
> + */
> +
> +#include <linux/dmar.h>
> +#include <linux/irq.h>
> +#include <linux/mdev.h>
> +#include <linux/pci.h>
> +
> +/*
> + * Determine if a dev is mdev or not. Return NULL if not mdev device.
> + * Return mdev's parent dev if success.
> + */
> +static inline struct device *mdev_to_parent(struct device *dev)
> +{
> + struct device *ret = NULL;
> + struct device *(*fn)(struct device *dev);
> + struct bus_type *bus = symbol_get(mdev_bus_type);
> +
> + if (bus && dev->bus == bus) {
> + fn = symbol_get(mdev_dev_to_parent_dev);
> + ret = fn(dev);
> + symbol_put(mdev_dev_to_parent_dev);
> + symbol_put(mdev_bus_type);

Leaks a reference to the mdev module if dev->bus != bus. The new
version of dev_is_mdev() unconditionally leaks a reference. Thanks,

Alex

> + }
> +
> + return ret;
> +}
> +
> +static struct pci_dev *ims_get_pci_dev(struct device *dev)
> +{
> + struct pci_dev *pdev;
> +
> + if (dev_is_mdev(dev)) {
> + struct device *parent = mdev_to_parent(dev);
> +
> + pdev = to_pci_dev(parent);
> + } else {
> + pdev = to_pci_dev(dev);
> + }
> +
> + return pdev;
> +}
> +
> +int dev_ims_prepare(struct irq_domain *domain, struct device *dev, int nvec,
> + msi_alloc_info_t *arg)
> +{
> + struct pci_dev *pdev = ims_get_pci_dev(dev);
> +
> + init_irq_alloc_info(arg, NULL);
> + arg->msi_dev = pdev;
> + arg->type = X86_IRQ_ALLOC_TYPE_MSIX;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dev_ims_prepare);
> +
> +#ifdef CONFIG_IRQ_REMAP
> +
> +static struct msi_domain_ops dev_ims_domain_ops = {
> + .get_hwirq = msi_get_hwirq,
> + .msi_prepare = dev_ims_prepare,
> +};
> +
> +static struct irq_chip dev_ims_ir_controller = {
> + .name = "IR-DEV-IMS",
> + .irq_unmask = dev_ims_unmask_irq,
> + .irq_mask = dev_ims_mask_irq,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent,
> + .flags = IRQCHIP_SKIP_SET_WAKE,
> + .irq_write_msi_msg = dev_ims_write_msg,
> +};
> +
> +static struct msi_domain_info ims_ir_domain_info = {
> + .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
> + MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
> + .ops = &dev_ims_domain_ops,
> + .chip = &dev_ims_ir_controller,
> + .handler = handle_edge_irq,
> + .handler_name = "edge",
> +};
> +
> +struct irq_domain *arch_create_ims_irq_domain(struct irq_domain *parent)
> +{
> + return pci_msi_create_irq_domain(NULL, &ims_ir_domain_info, parent);
> +}
> +
> +#endif
> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index 435bcda..65da813 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -84,7 +84,7 @@ void native_teardown_msi_irq(unsigned int irq)
> irq_domain_free_irqs(irq, 1);
> }
>
> -static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
> +irq_hw_number_t msi_get_hwirq(struct msi_domain_info *info,
> msi_alloc_info_t *arg)
> {
> return arg->msi_hwirq;
> @@ -116,7 +116,7 @@ void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
> EXPORT_SYMBOL_GPL(pci_msi_set_desc);
>
> static struct msi_domain_ops pci_msi_domain_ops = {
> - .get_hwirq = pci_msi_get_hwirq,
> + .get_hwirq = msi_get_hwirq,
> .msi_prepare = pci_msi_prepare,
> .set_desc = pci_msi_set_desc,
> };
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b558d4c..cecc6a6 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -33,6 +33,12 @@ struct device *mdev_parent_dev(struct mdev_device *mdev)
> }
> EXPORT_SYMBOL(mdev_parent_dev);
>
> +struct device *mdev_dev_to_parent_dev(struct device *dev)
> +{
> + return to_mdev_device(dev)->parent->dev;
> +}
> +EXPORT_SYMBOL(mdev_dev_to_parent_dev);
> +
> void *mdev_get_drvdata(struct mdev_device *mdev)
> {
> return mdev->driver_data;
> diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
> index 7d92295..c21f130 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -36,7 +36,6 @@ struct mdev_device {
> };
>
> #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev)
> -#define dev_is_mdev(d) ((d)->bus == &mdev_bus_type)
>
> struct mdev_type {
> struct kobject kobj;
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 0ce30ca..9dcbffe 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -144,5 +144,7 @@ void mdev_unregister_driver(struct mdev_driver *drv);
> struct device *mdev_parent_dev(struct mdev_device *mdev);
> struct device *mdev_dev(struct mdev_device *mdev);
> struct mdev_device *mdev_from_dev(struct device *dev);
> +struct device *mdev_dev_to_parent_dev(struct device *dev);
>
> +#define dev_is_mdev(d) ((d)->bus == symbol_get(mdev_bus_type))
> #endif /* MDEV_H */

2019-09-19 22:16:50

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC V1 0/7] Add support for a new IMS interrupt mechanism

On Fri, Sep 13, 2019 at 01:27:10PM -0700, Raj, Ashok wrote:
> On Fri, Sep 13, 2019 at 07:50:50PM +0000, Jason Gunthorpe wrote:
> > On Thu, Sep 12, 2019 at 06:32:01PM -0700, Megha Dey wrote:
> >
> > > This series is a basic patchset to get the ball rolling and receive some
> > > inital comments. As per my discussion with Marc Zyngier and Thomas Gleixner
> > > at the Linux Plumbers, I need to do the following:
> > > 1. Since a device can support MSI-X and IMS simultaneously, ensure proper
> > > locking mechanism for the 'msi_list' in the device structure.
> > > 2. Introduce dynamic allocation of IMS vectors perhaps by using a group ID
> > > 3. IMS support of a device needs to be discoverable. A bit in the vendor
> > > specific capability in the PCI config is to be added rather than getting
> > > this information from each device driver.
> >
> > Why #3? The point of this scheme is to delegate programming the
> > addr/data pairs to the driver so it can be done in some
> > device-specific way. There is no PCI standard behind this, and no
> > change in PCI semantics.
> >
> > I think it would be a tall ask to get a config space bit from PCI-SIG
> > for something that has little to do with PCI.
>
> This isn't a standard config capability. Its Designated Vendor Specific
> Capability (DVSEC). The device is responsible for managing the addr-data
> pair. This provides a hint to the OS framework that this device has
> device specific methods.
>
> Agreed its not required, but some OSV's like a generic way to discover
> these capabilities, hence its there so device vendors can have
> a common guideline.

I think it would be reasonable for a specific device driver to test
the DVSEC, if it wishes too.

Since it is not required it does not make sense for the core kernel to
enforce this on all devices, at least for such a nebulous reason.

Jason