2016-04-04 08:19:46

by Eric Auger

[permalink] [raw]
Subject: [PATCH v6 0/4] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

This series implements the MSI address mapping/unmapping in the MSI
sub-system. binding happens on msi_domain_set_affinity, msi_domain_activate,
msi_domain_deactivate.

a new MSI domain info flag value is introduced to report whether the msi domain
implements IRQ remapping. GIC v3 ITS is the first MSI controller advertising
it. This flag value will be used by VFIO subsystem to determine whether
MSI forwarding is safe.

More details & context can be found at:
http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/

Best Regards

Eric

Applies on top of part 1/3.

Git: complete series available at
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc1-pcie-passthrough-v6

History:

RFC v5 -> patch v6:
- split to ease the review process
- rebase on default iommu domain code (irq_data_to_msi_mapping_domain
checks IOMMU_DOMAIN_DMA type)
- fix unmap sequence on msi_domain_set_affinity (reported by Marc):
unmap the previous doorbell when the new one has been mapped & written to
the device, ie. irq_chip_write_msi_msg.
- "msi: msi_compose wrapper removed" following change above
- add size parameter to iommu_get_reserved_iova API following Marc's request

RFC v4 -> RFC v5:
- take into account Thomas' comments on MSI related patches
- split "msi: IOMMU map the doorbell address when needed"
- increase readability and add comments
- fix style issues
- split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
- platform ITS now advertises IOMMU_CAP_INTR_REMAP
- fix compilation issue with CONFIG_IOMMU API unset
- arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING

RFC v3 -> v4:
- Move doorbell mapping/unmapping in msi.c
- fix ref count issue on set_affinity: in case of a change in the address
the previous address is decremented
- doorbell map/unmap now is done on msi composition. Should allow the use
case for platform MSI controllers
- create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
to reserved IOVA management (looking like dma-iommu glue)
- series reordering to ease the review:
- first part is related to IOMMU
- second related to MSI sub-system
- third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
- expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
[this partially addresses Marc's comments on iommu_get/put_single_reserved
size/alignment problematic - which I did not ignore - but I don't know
how much I can do at the moment]

RFC v2 -> RFC v3:
- should fix wrong handling of some CONFIG combinations:
CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
- fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)

PATCH v1 -> RFC v2:
- reverted to RFC since it looks more reasonable ;-) the code is split
between VFIO, IOMMU, MSI controller and I am not sure I did the right
choices. Also API need to be further discussed.
- iova API usage in arm-smmu.c.
- MSI controller natively programs the MSI addr with either the PA or IOVA.
This is not done anymore in vfio-pci driver as suggested by Alex.
- check irq remapping capability of the group

RFC v1 [2] -> PATCH v1:
- use the existing dma map/unmap ioctl interface with a flag to register a
reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
- a single reserved IOVA contiguous region now is allowed
- use of an RB tree indexed by PA to store allocated reserved slots
- use of a vfio_domain iova_domain to manage iova allocation within the
window provided by the userspace
- vfio alloc_map/unmap_free take a vfio_group handle
- vfio_group handle is cached in vfio_pci_device
- add ref counting to bindings
- user modality enabled at the end of the series


Eric Auger (4):
msi: Add a new MSI_FLAG_IRQ_REMAPPING flag
irqchip/gic-v3-its: ITS advertises MSI_FLAG_IRQ_REMAPPING
msi: export msi_get_domain_info
msi: IOMMU map the doorbell address when needed

drivers/irqchip/irq-gic-v3-its-pci-msi.c | 3 +-
drivers/irqchip/irq-gic-v3-its-platform-msi.c | 3 +-
include/linux/msi.h | 17 ++++
kernel/irq/msi.c | 124 ++++++++++++++++++++++++++
4 files changed, 145 insertions(+), 2 deletions(-)

--
1.9.1


2016-04-04 08:19:55

by Eric Auger

[permalink] [raw]
Subject: [PATCH v6 3/4] msi: export msi_get_domain_info

We plan to use msi_get_domain_info in VFIO module so let's export it.

Signed-off-by: Eric Auger <[email protected]>

---

v2 -> v3:
- remove static implementation in case CONFIG_PCI_MSI_IRQ_DOMAIN is not set
---
kernel/irq/msi.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 38e89ce..9b0ba4a 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -400,5 +400,6 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
{
return (struct msi_domain_info *)domain->host_data;
}
+EXPORT_SYMBOL_GPL(msi_get_domain_info);

#endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
--
1.9.1

2016-04-04 08:20:01

by Eric Auger

[permalink] [raw]
Subject: [PATCH v6 4/4] msi: IOMMU map the doorbell address when needed

In case the msi is emitted by a device attached to an iommu domain
and this iommu domain requires MSI mapping, the msi address (aka
doorbell) must be mapped in the IOMMU. Else MSI write transaction
will cause a fault.

We handle the iommu mapping/unmapping anytime the msi address is
updated.

In case the mapping fails we just WARN_ON.

Signed-off-by: Eric Auger <[email protected]>

---

v6:
- check the domain type to detect bypass situation (rebase on
new default domain modality)
- fix the msi_domain_set_affinity sequence:
unmap the address once the PCI device has been given the new address
- use new proto for iommu_put_reserved_iova. New function names

v5:
- use macros to increase the readability
- add comments
- fix a typo that caused a compilation error if CONFIG_IOMMU_API
is not set
---
include/linux/msi.h | 15 +++++++
kernel/irq/msi.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 138 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 08441b1..1dc41db 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -10,6 +10,21 @@ struct msi_msg {
u32 data; /* 16 bits of msi message data */
};

+/* Helpers to convert the msi message address to a an iova/physical address */
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+#define msg_to_dma_addr(msg) \
+ (((dma_addr_t)((msg)->address_hi) << 32) | (msg)->address_lo)
+#else
+#define msg_to_dma_addr(msg) ((msg)->address_lo)
+#endif
+
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+#define msg_to_phys_addr(msg) \
+ (((phys_addr_t)((msg)->address_hi) << 32) | (msg)->address_lo)
+#else
+#define msg_to_phys_addr(msg) ((msg)->address_lo)
+#endif
+
extern int pci_msi_ignore_mask;
/* Helper functions */
struct irq_data;
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 9b0ba4a..7440653 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -17,6 +17,8 @@

/* Temparory solution for building, will be removed later */
#include <linux/pci.h>
+#include <linux/iommu.h>
+#include <linux/dma-reserved-iommu.h>

struct msi_desc *alloc_msi_entry(struct device *dev)
{
@@ -56,6 +58,94 @@ static inline void irq_chip_write_msi_msg(struct irq_data *data,
}

/**
+ * msi_map_doorbell: make sure an IOMMU mapping exists on domain @d
+ * for the message physical address (aka. doorbell)
+ *
+ * Either allocate an IOVA and create a mapping or simply increment
+ * a reference count on the existing IOMMU mapping
+ * @d: iommu domain handle the mapping belongs to
+ * @msg: msi message handle
+ */
+static int msi_map_doorbell(struct iommu_domain *d, struct msi_msg *msg)
+{
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+ phys_addr_t addr;
+ dma_addr_t iova;
+ int ret;
+
+ addr = msg_to_phys_addr(msg);
+ ret = iommu_get_reserved_iova(d, addr, sizeof(addr), IOMMU_WRITE, &iova);
+ if (!ret) {
+ msg->address_lo = lower_32_bits(iova);
+ msg->address_hi = upper_32_bits(iova);
+ }
+ return ret;
+#else
+ return -ENODEV;
+#endif
+}
+
+/**
+ * msi_unmap_doorbell: decrements the reference count on an existing
+ * doorbell IOMMU mapping
+ *
+ * @d: iommu domain the mapping is attached to
+ * @msg: msi message containing the doorbell IOVA to unbind
+ */
+static void msi_unmap_doorbell(struct iommu_domain *d, struct msi_msg *msg)
+{
+#ifdef CONFIG_IOMMU_DMA_RESERVED
+ dma_addr_t iova;
+
+ iova = msg_to_dma_addr(msg);
+ iommu_put_reserved_iova(d, iova);
+#endif
+}
+
+#ifdef CONFIG_IOMMU_API
+/**
+ * irq_data_to_msi_mapping_domain: checks if an irq corresponds to
+ * an MSI whose write address must be mapped in an IOMMU domain
+ *
+ * determine whether the irq corresponds to an MSI emitted by a device,
+ * upstream to an IOMMU, and if this IOMMU requires a binding of the
+ * MSI address
+ *
+ * @irq_data: irq data handle
+ */
+static struct iommu_domain *
+irq_data_to_msi_mapping_domain(struct irq_data *irq_data)
+{
+ struct iommu_domain *d;
+ struct msi_desc *desc;
+ struct device *dev;
+ int ret;
+
+ desc = irq_data_get_msi_desc(irq_data);
+ if (!desc)
+ return NULL;
+
+ dev = msi_desc_to_dev(desc);
+
+ d = iommu_get_domain_for_dev(dev);
+ if (!d || (d->type == IOMMU_DOMAIN_DMA))
+ return NULL;
+
+ ret = iommu_domain_get_attr(d, DOMAIN_ATTR_MSI_MAPPING, NULL);
+ if (!ret)
+ return d;
+ else
+ return NULL;
+}
+#else
+static inline struct iommu_domain *
+irq_data_to_msi_mapping_domain(struct irq_data *irq_data)
+{
+ return NULL;
+}
+#endif /* CONFIG_IOMMU_API */
+
+/**
* msi_domain_set_affinity - Generic affinity setter function for MSI domains
* @irq_data: The irq data associated to the interrupt
* @mask: The affinity mask to set
@@ -73,8 +163,27 @@ int msi_domain_set_affinity(struct irq_data *irq_data,

ret = parent->chip->irq_set_affinity(parent, mask, force);
if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
+ bool doorbell_change = false;
+ struct iommu_domain *d;
+ struct msi_msg old_msg;
+
+ d = irq_data_to_msi_mapping_domain(irq_data);
+ if (unlikely(d)) {
+ get_cached_msi_msg(irq_data->irq, &old_msg);
+ doorbell_change =
+ (old_msg.address_lo != msg.address_lo) ||
+ (old_msg.address_hi != msg.address_hi);
+ }
+
BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
+
+ if (unlikely(doorbell_change))
+ WARN_ON(msi_map_doorbell(d, &msg));
+
irq_chip_write_msi_msg(irq_data, &msg);
+
+ if (unlikely(doorbell_change))
+ msi_unmap_doorbell(d, &old_msg);
}

return ret;
@@ -83,19 +192,33 @@ int msi_domain_set_affinity(struct irq_data *irq_data,
static void msi_domain_activate(struct irq_domain *domain,
struct irq_data *irq_data)
{
+ struct iommu_domain *d;
struct msi_msg msg;

+ d = irq_data_to_msi_mapping_domain(irq_data);
+
BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
+ if (unlikely(d))
+ WARN_ON(msi_map_doorbell(d, &msg));
irq_chip_write_msi_msg(irq_data, &msg);
}

static void msi_domain_deactivate(struct irq_domain *domain,
struct irq_data *irq_data)
{
+ struct iommu_domain *d;
+ struct msi_msg old_msg;
struct msi_msg msg;

+ d = irq_data_to_msi_mapping_domain(irq_data);
+ if (unlikely(d))
+ get_cached_msi_msg(irq_data->irq, &old_msg);
+
memset(&msg, 0, sizeof(msg));
irq_chip_write_msi_msg(irq_data, &msg);
+
+ if (unlikely(d))
+ msi_unmap_doorbell(d, &old_msg);
}

static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
--
1.9.1

2016-04-04 08:20:55

by Eric Auger

[permalink] [raw]
Subject: [PATCH v6 2/4] irqchip/gic-v3-its: ITS advertises MSI_FLAG_IRQ_REMAPPING

The ITS is the first ARM MSI controller advertising the new
MSI_FLAG_IRQ_REMAPPING flag. It does so because it supports
interrupt translation service. This HW support offers isolation
of MSIs, feature used when using KVM device passthrough.

Signed-off-by: Eric Auger <[email protected]>

---

v5: new
---
drivers/irqchip/irq-gic-v3-its-pci-msi.c | 3 ++-
drivers/irqchip/irq-gic-v3-its-platform-msi.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
index aee60ed..8223765 100644
--- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c
@@ -96,7 +96,8 @@ static struct msi_domain_ops its_pci_msi_ops = {

static struct msi_domain_info its_pci_msi_domain_info = {
.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
- MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX),
+ MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX |
+ MSI_FLAG_IRQ_REMAPPING),
.ops = &its_pci_msi_ops,
.chip = &its_msi_irq_chip,
};
diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
index 470b4aa..8c0d69d 100644
--- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
@@ -63,7 +63,8 @@ static struct msi_domain_ops its_pmsi_ops = {
};

static struct msi_domain_info its_pmsi_domain_info = {
- .flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS),
+ .flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+ MSI_FLAG_IRQ_REMAPPING),
.ops = &its_pmsi_ops,
.chip = &its_pmsi_irq_chip,
};
--
1.9.1

2016-04-04 08:22:04

by Eric Auger

[permalink] [raw]
Subject: [PATCH v6 1/4] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag

Let's introduce a new msi_domain_info flag value, MSI_FLAG_IRQ_REMAPPING
meant to tell the domain supports IRQ REMAPPING, also known as Interrupt
Translation Service. On Intel HW this IRQ remapping capability is
abstracted on IOMMU side while on ARM it is abstracted on MSI controller
side. This flag will be used to know whether the MSI passthrough is
safe.

Signed-off-by: Eric Auger <[email protected]>

---

v4 -> v5:
- seperate flag introduction from first user addition (ITS)
---
include/linux/msi.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8b425c6..08441b1 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -270,6 +270,8 @@ enum {
MSI_FLAG_MULTI_PCI_MSI = (1 << 3),
/* Support PCI MSIX interrupts */
MSI_FLAG_PCI_MSIX = (1 << 4),
+ /* Support MSI IRQ remapping service */
+ MSI_FLAG_IRQ_REMAPPING = (1 << 5),
};

int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
--
1.9.1

2016-04-04 09:23:23

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v6 4/4] msi: IOMMU map the doorbell address when needed

Hi Eric,

[auto build test ERROR on tip/irq/core]
[also build test ERROR on v4.6-rc2 next-20160404]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Eric-Auger/KVM-PCIe-MSI-passthrough-on-ARM-ARM64-kernel-part-2-3-msi-changes/20160404-162527
config: x86_64-randconfig-s1-04041632 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

>> kernel/irq/msi.c:21:38: fatal error: linux/dma-reserved-iommu.h: No such file or directory
compilation terminated.

vim +21 kernel/irq/msi.c

15 #include <linux/irqdomain.h>
16 #include <linux/msi.h>
17
18 /* Temparory solution for building, will be removed later */
19 #include <linux/pci.h>
20 #include <linux/iommu.h>
> 21 #include <linux/dma-reserved-iommu.h>
22
23 struct msi_desc *alloc_msi_entry(struct device *dev)
24 {

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.14 kB)
.config.gz (27.19 kB)
Download all attachments

2016-04-04 09:29:37

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v6 4/4] msi: IOMMU map the doorbell address when needed

On 04/04/2016 11:22 AM, kbuild test robot wrote:
> Hi Eric,
>
> [auto build test ERROR on tip/irq/core]
> [also build test ERROR on v4.6-rc2 next-20160404]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
>
> url: https://github.com/0day-ci/linux/commits/Eric-Auger/KVM-PCIe-MSI-passthrough-on-ARM-ARM64-kernel-part-2-3-msi-changes/20160404-162527
> config: x86_64-randconfig-s1-04041632 (attached as .config)
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All errors (new ones prefixed by >>):
>
>>> kernel/irq/msi.c:21:38: fatal error: linux/dma-reserved-iommu.h: No such file or directory
> compilation terminated.

Yes this was expected. this series needs to be applied on part 1: [PATCH
v6 0/7] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 1/3: iommu
changes, https://lkml.org/lkml/2016/4/4/90.

Same for part 3.

I split the whole series to ease the review process however this comes
at the expense of automatic testing. Sorry for that.

Best Regards

Eric


>
> vim +21 kernel/irq/msi.c
>
> 15 #include <linux/irqdomain.h>
> 16 #include <linux/msi.h>
> 17
> 18 /* Temparory solution for building, will be removed later */
> 19 #include <linux/pci.h>
> 20 #include <linux/iommu.h>
> > 21 #include <linux/dma-reserved-iommu.h>
> 22
> 23 struct msi_desc *alloc_msi_entry(struct device *dev)
> 24 {
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
>