LinuxLists.cc - [PATCH v4 0/2] PCI: hv: Hyper-V vPCI for arm64

2021-11-10 00:26:46

Subject: [PATCH v4 0/2] PCI: hv: Hyper-V vPCI for arm64

From: Sunil Muthuswamy <[email protected]>

Current Hyper-V vPCI code only compiles and works for x86. There are some
hardcoded assumptions about the architectural IRQ chip and other arch
defines.

Add support for Hyper-V vPCI for arm64 by first breaking the current hard
coded dependency using a set of new interfaces and implementing those for
x86 first. That is in the first patch. The second patch adds support for
Hyper-V vPCI for arm64 by implementing the above mentioned interfaces. That
is done by introducing a Hyper-V vPCI specific MSI IRQ domain & chip for
allocating SPI vectors.

changes in v1 -> v2:
- Moved the irqchip implementation to drivers/pci as suggested
by Marc Zyngier
- Addressed Multi-MSI handling issues identified by Marc Zyngier
- Addressed lock/synchronization primitive as suggested by Marc
Zyngier
- Addressed other code feedback from Marc Zyngier

changes in v2 -> v3:
- Addressed comments from Bjorn Helgaas about patch formatting and
verbiage
- Using 'git send-email' to ensure that the patch series is correctly
threaded. Feedback by Bjorn Helgaas
- Fixed Hyper-V vPCI build break for module build, reported by Boqun Feng

changes in v3 -> v4:
- Removed the separate file for IRQ chip that was there in previous
iterations and moved the IRQ chip implementation to pci-hyperv.c.
Feedback by Michael Kelley and Marc Zyngier.
- Addressed various comments from Marc Zyngier about structuring and
layout.
- Addressed comment from Marc Zyngier about IRQ affinity and other
miscellaneous comments.

Sunil Muthuswamy (2):
PCI: hv: Make the code arch neutral by adding arch specific interfaces
arm64: PCI: hv: Add support for Hyper-V vPCI

arch/arm64/include/asm/hyperv-tlfs.h | 9 +
arch/x86/include/asm/hyperv-tlfs.h | 33 ++++
arch/x86/include/asm/mshyperv.h | 7 -
drivers/pci/Kconfig | 2 +-
drivers/pci/controller/Kconfig | 2 +-
drivers/pci/controller/pci-hyperv.c | 284 ++++++++++++++++++++++++---
include/asm-generic/hyperv-tlfs.h | 33 ----
7 files changed, 303 insertions(+), 67 deletions(-)

base-commit: e4e737bb5c170df6135a127739a9e6148ee3da82
--
2.25.1

2021-11-10 00:26:46

by Sunil Muthuswamy

[permalink] [raw]

Subject: [PATCH v4 2/2] arm64: PCI: hv: Add support for Hyper-V vPCI

From: Sunil Muthuswamy <[email protected]>

Add support for Hyper-V vPCI for arm64 by implementing the arch specific
interfaces. Introduce an IRQ domain and chip specific to Hyper-v vPCI that
is based on SPIs. The IRQ domain parents itself to the arch GIC IRQ domain
for basic vector management.

Signed-off-by: Sunil Muthuswamy <[email protected]>
---
In v2, v3 & v4:
Changes are described in the cover letter.

arch/arm64/include/asm/hyperv-tlfs.h | 9 ++
drivers/pci/Kconfig | 2 +-
drivers/pci/controller/Kconfig | 2 +-
drivers/pci/controller/pci-hyperv.c | 207 ++++++++++++++++++++++++++-
4 files changed, 217 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/hyperv-tlfs.h b/arch/arm64/include/asm/hyperv-tlfs.h
index 4d964a7f02ee..bc6c7ac934a1 100644
--- a/arch/arm64/include/asm/hyperv-tlfs.h
+++ b/arch/arm64/include/asm/hyperv-tlfs.h
@@ -64,6 +64,15 @@
#define HV_REGISTER_STIMER0_CONFIG 0x000B0000
#define HV_REGISTER_STIMER0_COUNT 0x000B0001

+union hv_msi_entry {
+ u64 as_uint64[2];
+ struct {
+ u64 address;
+ u32 data;
+ u32 reserved;
+ } __packed;
+};
+
#include <asm-generic/hyperv-tlfs.h>

#endif
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..cc2471488b61 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -184,7 +184,7 @@ config PCI_LABEL

config PCI_HYPERV
tristate "Hyper-V PCI Frontend"
- depends on X86_64 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && SYSFS
+ depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && SYSFS
select PCI_HYPERV_INTERFACE
help
The PCI device frontend driver allows the kernel to import arbitrary
diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index 326f7d13024f..b24edba0b870 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -280,7 +280,7 @@ config PCIE_BRCMSTB

config PCI_HYPERV_INTERFACE
tristate "Hyper-V PCI Interface"
- depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+ depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN
help
The Hyper-V PCI Interface is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 03e07a4f0e3f..8ab57582b3a5 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -47,6 +47,8 @@
#include <linux/msi.h>
#include <linux/hyperv.h>
#include <linux/refcount.h>
+#include <linux/irqdomain.h>
+#include <linux/acpi.h>
#include <asm/mshyperv.h>

/*
@@ -614,7 +616,205 @@ static int hv_msi_prepare(struct irq_domain *domain, struct device *dev,
{
return pci_msi_prepare(domain, dev, nvec, info);
}
-#endif // CONFIG_X86
+#elif defined(CONFIG_ARM64)
+/*
+ * SPI vectors to use for vPCI; arch SPIs range is [32, 1019], but leaving a bit
+ * of room at the start to allow for SPIs to be specified through ACPI and
+ * starting with a power of two to satisfy power of 2 multi-MSI requirement.
+ */
+#define HV_PCI_MSI_SPI_START 64
+#define HV_PCI_MSI_SPI_NR (1020 - HV_PCI_MSI_SPI_START)
+#define DELIVERY_MODE 0
+#define FLOW_HANDLER NULL
+#define FLOW_NAME NULL
+#define hv_msi_prepare NULL
+
+struct hv_pci_chip_data {
+ DECLARE_BITMAP(spi_map, HV_PCI_MSI_SPI_NR);
+ struct mutex map_lock;
+};
+
+/* Hyper-V vPCI MSI GIC IRQ domain */
+static struct irq_domain *hv_msi_gic_irq_domain;
+
+/* Hyper-V PCI MSI IRQ chip */
+static struct irq_chip hv_arm64_msi_irq_chip = {
+ .name = "MSI",
+ .irq_set_affinity = irq_chip_set_affinity_parent,
+ .irq_eoi = irq_chip_eoi_parent,
+ .irq_mask = irq_chip_mask_parent,
+ .irq_unmask = irq_chip_unmask_parent
+};
+
+static unsigned int hv_msi_get_int_vector(struct irq_data *irqd)
+{
+ return irqd->parent_data->hwirq;
+}
+
+static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
+ struct msi_desc *msi_desc)
+{
+ msi_entry->address = ((u64)msi_desc->msg.address_hi << 32) |
+ msi_desc->msg.address_lo;
+ msi_entry->data = msi_desc->msg.data;
+}
+
+static void hv_pci_vec_irq_domain_free(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs)
+{
+ struct hv_pci_chip_data *chip_data = domain->host_data;
+ struct irq_data *irqd = irq_domain_get_irq_data(domain, virq);
+ int first = irqd->hwirq - HV_PCI_MSI_SPI_START;
+
+ mutex_lock(&chip_data->map_lock);
+ bitmap_release_region(chip_data->spi_map,
+ first,
+ get_count_order(nr_irqs));
+ mutex_unlock(&chip_data->map_lock);
+ irq_domain_reset_irq_data(irqd);
+ irq_domain_free_irqs_parent(domain, virq, nr_irqs);
+}
+
+static int hv_pci_vec_alloc_device_irq(struct irq_domain *domain,
+ unsigned int nr_irqs,
+ irq_hw_number_t *hwirq)
+{
+ struct hv_pci_chip_data *chip_data = domain->host_data;
+ unsigned int index;
+
+ /* Find and allocate region from the SPI bitmap */
+ mutex_lock(&chip_data->map_lock);
+ index = bitmap_find_free_region(chip_data->spi_map,
+ HV_PCI_MSI_SPI_NR,
+ get_count_order(nr_irqs));
+ mutex_unlock(&chip_data->map_lock);
+ if (index < 0)
+ return -ENOSPC;
+
+ *hwirq = index + HV_PCI_MSI_SPI_START;
+
+ return 0;
+}
+
+static int hv_pci_vec_irq_gic_domain_alloc(struct irq_domain *domain,
+ unsigned int virq,
+ irq_hw_number_t hwirq)
+{
+ struct irq_fwspec fwspec;
+
+ fwspec.fwnode = domain->parent->fwnode;
+ fwspec.param_count = 2;
+ fwspec.param[0] = hwirq;
+ fwspec.param[1] = IRQ_TYPE_EDGE_RISING;
+
+ return irq_domain_alloc_irqs_parent(domain, virq, 1, &fwspec);
+}
+
+static int hv_pci_vec_irq_domain_alloc(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs,
+ void *args)
+{
+ irq_hw_number_t hwirq;
+ unsigned int i;
+ int ret;
+
+ ret = hv_pci_vec_alloc_device_irq(domain, nr_irqs, &hwirq);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < nr_irqs; i++) {
+ ret = hv_pci_vec_irq_gic_domain_alloc(domain, virq + i,
+ hwirq + i);
+ if (ret)
+ goto free_irq;
+
+ ret = irq_domain_set_hwirq_and_chip(domain, virq + i,
+ hwirq + i,
+ &hv_arm64_msi_irq_chip,
+ domain->host_data);
+ if (ret)
+ goto free_irq;
+
+ pr_debug("pID:%d vID:%u\n", (int)(hwirq + i), virq + i);
+ }
+
+ return 0;
+
+free_irq:
+ hv_pci_vec_irq_domain_free(domain, virq, nr_irqs);
+
+ return ret;
+}
+
+static int hv_pci_vec_irq_domain_activate(struct irq_domain *domain,
+ struct irq_data *irqd, bool reserve)
+{
+ static int cpu;
+
+ /*
+ * Pick a cpu using round-robin as the irq affinity that can be
+ * temporarily used for composing MSI from the hypervisor. GIC
+ * will eventually set the right affinity for the irq and the
+ * 'unmask' will retarget the interrupt to that cpu.
+ */
+ if (cpu >= cpumask_last(cpu_online_mask))
+ cpu = 0;
+ cpu = cpumask_next(cpu, cpu_online_mask);
+ irq_data_update_effective_affinity(irqd, cpumask_of(cpu));
+
+ return 0;
+}
+
+static const struct irq_domain_ops hv_pci_domain_ops = {
+ .alloc = hv_pci_vec_irq_domain_alloc,
+ .free = hv_pci_vec_irq_domain_free,
+ .activate = hv_pci_vec_irq_domain_activate,
+};
+
+static int hv_pci_irqchip_init(void)
+{
+ static struct hv_pci_chip_data *chip_data;
+ struct fwnode_handle *fn = NULL;
+ int ret = -ENOMEM;
+
+ chip_data = kzalloc(sizeof(*chip_data), GFP_KERNEL);
+ if (!chip_data)
+ return ret;
+
+ mutex_init(&chip_data->map_lock);
+ fn = irq_domain_alloc_named_fwnode("Hyper-V ARM64 vPCI");
+ if (!fn)
+ goto free_chip;
+
+ /*
+ * IRQ domain once enabled, should not be removed since there is no
+ * way to ensure that all the corresponding devices are also gone and
+ * no interrupts will be generated.
+ */
+ hv_msi_gic_irq_domain = acpi_irq_create_hierarchy(0, HV_PCI_MSI_SPI_NR,
+ fn, &hv_pci_domain_ops,
+ chip_data);
+
+ if (!hv_msi_gic_irq_domain) {
+ pr_err("Failed to create Hyper-V ARMV vPCI MSI IRQ domain\n");
+ goto free_chip;
+ }
+
+ return 0;
+
+free_chip:
+ kfree(chip_data);
+ if (fn)
+ irq_domain_free_fwnode(fn);
+
+ return ret;
+}
+
+static struct irq_domain *hv_pci_get_root_domain(void)
+{
+ return hv_msi_gic_irq_domain;
+}
+#endif //CONFIG_ARM64

/**
* hv_pci_generic_compl() - Invoked for a completion packet
@@ -1227,6 +1427,8 @@ static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info *info,
static void hv_irq_mask(struct irq_data *data)
{
pci_msi_mask_irq(data);
+ if (data->parent_data->chip->irq_mask)
+ irq_chip_mask_parent(data);
}

/**
@@ -1343,6 +1545,8 @@ static void hv_irq_unmask(struct irq_data *data)
dev_err(&hbus->hdev->device,
"%s() failed: %#llx", __func__, res);

+ if (data->parent_data->chip->irq_unmask)
+ irq_chip_unmask_parent(data);
pci_msi_unmask_irq(data);
}

@@ -1619,6 +1823,7 @@ static struct irq_chip hv_msi_irq_chip = {
.irq_compose_msi_msg = hv_compose_msi_msg,
.irq_set_affinity = irq_chip_set_affinity_parent,
.irq_ack = irq_chip_ack_parent,
+ .irq_eoi = irq_chip_eoi_parent,
.irq_mask = hv_irq_mask,
.irq_unmask = hv_irq_unmask,
};
--
2.25.1

2021-11-10 00:28:21

by Sunil Muthuswamy

[permalink] [raw]

Subject: [PATCH v4 1/2] PCI: hv: Make the code arch neutral by adding arch specific interfaces

From: Sunil Muthuswamy <[email protected]>

Encapsulate arch dependencies in Hyper-V vPCI through a set of
arch-dependent interfaces. Adding these arch specific interfaces will
allow for an implementation for other architectures, such as arm64.

There are no functional changes expected from this patch.

Signed-off-by: Sunil Muthuswamy <[email protected]>
---
In v2, v3 & v4:
Changes are described in the cover letter.

arch/x86/include/asm/hyperv-tlfs.h | 33 ++++++++++++
arch/x86/include/asm/mshyperv.h | 7 ---
drivers/pci/controller/pci-hyperv.c | 79 ++++++++++++++++++++---------
include/asm-generic/hyperv-tlfs.h | 33 ------------
4 files changed, 87 insertions(+), 65 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 2322d6bd5883..fdf3d28fbdd5 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -585,6 +585,39 @@ enum hv_interrupt_type {
HV_X64_INTERRUPT_TYPE_MAXIMUM = 0x000A,
};

+union hv_msi_address_register {
+ u32 as_uint32;
+ struct {
+ u32 reserved1:2;
+ u32 destination_mode:1;
+ u32 redirection_hint:1;
+ u32 reserved2:8;
+ u32 destination_id:8;
+ u32 msi_base:12;
+ };
+} __packed;
+
+union hv_msi_data_register {
+ u32 as_uint32;
+ struct {
+ u32 vector:8;
+ u32 delivery_mode:3;
+ u32 reserved1:3;
+ u32 level_assert:1;
+ u32 trigger_mode:1;
+ u32 reserved2:16;
+ };
+} __packed;
+
+/* HvRetargetDeviceInterrupt hypercall */
+union hv_msi_entry {
+ u64 as_uint64;
+ struct {
+ union hv_msi_address_register address;
+ union hv_msi_data_register data;
+ } __packed;
+};
+
#include <asm-generic/hyperv-tlfs.h>

#endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..c2b9ab94408e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -176,13 +176,6 @@ bool hv_vcpu_is_preempted(int vcpu);
static inline void hv_apic_init(void) {}
#endif

-static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
- struct msi_desc *msi_desc)
-{
- msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
- msi_entry->data.as_uint32 = msi_desc->msg.data;
-}
-
struct irq_domain *hv_create_pci_msi_domain(void);

int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index eaec915ffe62..03e07a4f0e3f 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -43,9 +43,6 @@
#include <linux/pci-ecam.h>
#include <linux/delay.h>
#include <linux/semaphore.h>
-#include <linux/irqdomain.h>
-#include <asm/irqdomain.h>
-#include <asm/apic.h>
#include <linux/irq.h>
#include <linux/msi.h>
#include <linux/hyperv.h>
@@ -583,6 +580,42 @@ struct hv_pci_compl {

static void hv_pci_onchannelcallback(void *context);

+#ifdef CONFIG_X86
+#define DELIVERY_MODE APIC_DELIVERY_MODE_FIXED
+#define FLOW_HANDLER handle_edge_irq
+#define FLOW_NAME "edge"
+
+static int hv_pci_irqchip_init(void)
+{
+ return 0;
+}
+
+static struct irq_domain *hv_pci_get_root_domain(void)
+{
+ return x86_vector_domain;
+}
+
+static unsigned int hv_msi_get_int_vector(struct irq_data *data)
+{
+ struct irq_cfg *cfg = irqd_cfg(data);
+
+ return cfg->vector;
+}
+
+static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
+ struct msi_desc *msi_desc)
+{
+ msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
+ msi_entry->data.as_uint32 = msi_desc->msg.data;
+}
+
+static int hv_msi_prepare(struct irq_domain *domain, struct device *dev,
+ int nvec, msi_alloc_info_t *info)
+{
+ return pci_msi_prepare(domain, dev, nvec, info);
+}
+#endif // CONFIG_X86
+
/**
* hv_pci_generic_compl() - Invoked for a completion packet
* @context: Set up by the sender of the packet.
@@ -1191,14 +1224,6 @@ static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info *info,
put_pcichild(hpdev);
}

-static int hv_set_affinity(struct irq_data *data, const struct cpumask *dest,
- bool force)
-{
- struct irq_data *parent = data->parent_data;
-
- return parent->chip->irq_set_affinity(parent, dest, force);
-}
-
static void hv_irq_mask(struct irq_data *data)
{
pci_msi_mask_irq(data);
@@ -1217,7 +1242,6 @@ static void hv_irq_mask(struct irq_data *data)
static void hv_irq_unmask(struct irq_data *data)
{
struct msi_desc *msi_desc = irq_data_get_msi_desc(data);
- struct irq_cfg *cfg = irqd_cfg(data);
struct hv_retarget_device_interrupt *params;
struct hv_pcibus_device *hbus;
struct cpumask *dest;
@@ -1246,7 +1270,7 @@ static void hv_irq_unmask(struct irq_data *data)
(hbus->hdev->dev_instance.b[7] << 8) |
(hbus->hdev->dev_instance.b[6] & 0xf8) |
PCI_FUNC(pdev->devfn);
- params->int_target.vector = cfg->vector;
+ params->int_target.vector = hv_msi_get_int_vector(data);

/*
* Honoring apic->delivery_mode set to APIC_DELIVERY_MODE_FIXED by
@@ -1347,7 +1371,7 @@ static u32 hv_compose_msi_req_v1(
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode = APIC_DELIVERY_MODE_FIXED;
+ int_pkt->int_desc.delivery_mode = DELIVERY_MODE;

/*
* Create MSI w/ dummy vCPU set, overwritten by subsequent retarget in
@@ -1377,7 +1401,7 @@ static u32 hv_compose_msi_req_v2(
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode = APIC_DELIVERY_MODE_FIXED;
+ int_pkt->int_desc.delivery_mode = DELIVERY_MODE;
cpu = hv_compose_msi_req_get_cpu(affinity);
int_pkt->int_desc.processor_array[0] =
hv_cpu_number_to_vp_number(cpu);
@@ -1397,7 +1421,7 @@ static u32 hv_compose_msi_req_v3(
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.reserved = 0;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode = APIC_DELIVERY_MODE_FIXED;
+ int_pkt->int_desc.delivery_mode = DELIVERY_MODE;
cpu = hv_compose_msi_req_get_cpu(affinity);
int_pkt->int_desc.processor_array[0] =
hv_cpu_number_to_vp_number(cpu);
@@ -1419,7 +1443,6 @@ static u32 hv_compose_msi_req_v3(
*/
static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
{
- struct irq_cfg *cfg = irqd_cfg(data);
struct hv_pcibus_device *hbus;
struct vmbus_channel *channel;
struct hv_pci_dev *hpdev;
@@ -1470,7 +1493,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
size = hv_compose_msi_req_v1(&ctxt.int_pkts.v1,
dest,
hpdev->desc.win_slot.slot,
- cfg->vector);
+ hv_msi_get_int_vector(data));
break;

case PCI_PROTOCOL_VERSION_1_2:
@@ -1478,14 +1501,14 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
size = hv_compose_msi_req_v2(&ctxt.int_pkts.v2,
dest,
hpdev->desc.win_slot.slot,
- cfg->vector);
+ hv_msi_get_int_vector(data));
break;

case PCI_PROTOCOL_VERSION_1_4:
size = hv_compose_msi_req_v3(&ctxt.int_pkts.v3,
dest,
hpdev->desc.win_slot.slot,
- cfg->vector);
+ hv_msi_get_int_vector(data));
break;

default:
@@ -1594,14 +1617,14 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
static struct irq_chip hv_msi_irq_chip = {
.name = "Hyper-V PCIe MSI",
.irq_compose_msi_msg = hv_compose_msi_msg,
- .irq_set_affinity = hv_set_affinity,
+ .irq_set_affinity = irq_chip_set_affinity_parent,
.irq_ack = irq_chip_ack_parent,
.irq_mask = hv_irq_mask,
.irq_unmask = hv_irq_unmask,
};

static struct msi_domain_ops hv_msi_ops = {
- .msi_prepare = pci_msi_prepare,
+ .msi_prepare = hv_msi_prepare,
.msi_free = hv_msi_free,
};

@@ -1625,12 +1648,12 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus)
hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS |
MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI |
MSI_FLAG_PCI_MSIX);
- hbus->msi_info.handler = handle_edge_irq;
- hbus->msi_info.handler_name = "edge";
+ hbus->msi_info.handler = FLOW_HANDLER;
+ hbus->msi_info.handler_name = FLOW_NAME;
hbus->msi_info.data = hbus;
hbus->irq_domain = pci_msi_create_irq_domain(hbus->fwnode,
&hbus->msi_info,
- x86_vector_domain);
+ hv_pci_get_root_domain());
if (!hbus->irq_domain) {
dev_err(&hbus->hdev->device,
"Failed to build an MSI IRQ domain\n");
@@ -3535,9 +3558,15 @@ static void __exit exit_hv_pci_drv(void)

static int __init init_hv_pci_drv(void)
{
+ int ret;
+
if (!hv_is_hyperv_initialized())
return -ENODEV;

+ ret = hv_pci_irqchip_init();
+ if (ret)
+ return ret;
+
/* Set the invalid domain number's bit, so it will not be used */
set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 56348a541c50..45cc0c3b8ed7 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -539,39 +539,6 @@ enum hv_interrupt_source {
HV_INTERRUPT_SOURCE_IOAPIC,
};

-union hv_msi_address_register {
- u32 as_uint32;
- struct {
- u32 reserved1:2;
- u32 destination_mode:1;
- u32 redirection_hint:1;
- u32 reserved2:8;
- u32 destination_id:8;
- u32 msi_base:12;
- };
-} __packed;
-
-union hv_msi_data_register {
- u32 as_uint32;
- struct {
- u32 vector:8;
- u32 delivery_mode:3;
- u32 reserved1:3;
- u32 level_assert:1;
- u32 trigger_mode:1;
- u32 reserved2:16;
- };
-} __packed;
-
-/* HvRetargetDeviceInterrupt hypercall */
-union hv_msi_entry {
- u64 as_uint64;
- struct {
- union hv_msi_address_register address;
- union hv_msi_data_register data;
- } __packed;
-};
-
union hv_ioapic_rte {
u64 as_uint64;

--
2.25.1

2021-11-10 13:23:13

by Marc Zyngier

[permalink] [raw]

Subject: Re: [PATCH v4 2/2] arm64: PCI: hv: Add support for Hyper-V vPCI

On Tue, 09 Nov 2021 22:14:20 +0000,
Sunil Muthuswamy <[email protected]> wrote:
>
> From: Sunil Muthuswamy <[email protected]>
>
> Add support for Hyper-V vPCI for arm64 by implementing the arch specific
> interfaces. Introduce an IRQ domain and chip specific to Hyper-v vPCI that
> is based on SPIs. The IRQ domain parents itself to the arch GIC IRQ domain
> for basic vector management.
>
> Signed-off-by: Sunil Muthuswamy <[email protected]>
> ---
> In v2, v3 & v4:
> Changes are described in the cover letter.
>
> arch/arm64/include/asm/hyperv-tlfs.h | 9 ++
> drivers/pci/Kconfig | 2 +-
> drivers/pci/controller/Kconfig | 2 +-
> drivers/pci/controller/pci-hyperv.c | 207 ++++++++++++++++++++++++++-
> 4 files changed, 217 insertions(+), 3 deletions(-)

[...]

> +static int hv_pci_vec_irq_domain_activate(struct irq_domain *domain,
> + struct irq_data *irqd, bool reserve)
> +{
> + static int cpu;
> +
> + /*
> + * Pick a cpu using round-robin as the irq affinity that can be
> + * temporarily used for composing MSI from the hypervisor. GIC
> + * will eventually set the right affinity for the irq and the
> + * 'unmask' will retarget the interrupt to that cpu.
> + */
> + if (cpu >= cpumask_last(cpu_online_mask))
> + cpu = 0;
> + cpu = cpumask_next(cpu, cpu_online_mask);
> + irq_data_update_effective_affinity(irqd, cpumask_of(cpu));

The mind boggles.

Let's imagine a single machine. cpu_online_mask only has bit 0 set,
and nr_cpumask_bits is 1. This is the first run, and cpu is 1:

cpu = cpumask_next(cpu, cpu_online_mask);

cpu is now set to 1. Which is not a valid CPU number, but a valid
return value indicating that there is no next CPU as it is equal to
nr_cpumask_bits. cpumask_of(cpu) will then diligently return crap,
which you carefully store into the irq descriptor. The IRQ subsystem
thanks you.

The same reasoning applies to any number of CPUs, and you obviously
never checked what any of this does :-(. As to what the behaviour is
when multiple CPUs run this function in parallel, let's not even
bother (locking is overrated).

Logic and concurrency issues aside, why do you even bother setting
some round-robin affinity if all you need is to set *something* so
that a hypervisor message can be composed? Why not use the first
online CPU? At least it will be correct.

M.

--
Without deviation from the norm, progress is not possible.

2021-11-10 13:31:00

by Marc Zyngier

[permalink] [raw]

Subject: Re: [PATCH v4 2/2] arm64: PCI: hv: Add support for Hyper-V vPCI

On Wed, 10 Nov 2021 13:20:32 +0000,
Marc Zyngier <[email protected]> wrote:
>
> On Tue, 09 Nov 2021 22:14:20 +0000,
> Sunil Muthuswamy <[email protected]> wrote:
> >
> > From: Sunil Muthuswamy <[email protected]>
> >
> > Add support for Hyper-V vPCI for arm64 by implementing the arch specific
> > interfaces. Introduce an IRQ domain and chip specific to Hyper-v vPCI that
> > is based on SPIs. The IRQ domain parents itself to the arch GIC IRQ domain
> > for basic vector management.
> >
> > Signed-off-by: Sunil Muthuswamy <[email protected]>
> > ---
> > In v2, v3 & v4:
> > Changes are described in the cover letter.
> >
> > arch/arm64/include/asm/hyperv-tlfs.h | 9 ++
> > drivers/pci/Kconfig | 2 +-
> > drivers/pci/controller/Kconfig | 2 +-
> > drivers/pci/controller/pci-hyperv.c | 207 ++++++++++++++++++++++++++-
> > 4 files changed, 217 insertions(+), 3 deletions(-)
>
> [...]
>
> > +static int hv_pci_vec_irq_domain_activate(struct irq_domain *domain,
> > + struct irq_data *irqd, bool reserve)
> > +{
> > + static int cpu;
> > +
> > + /*
> > + * Pick a cpu using round-robin as the irq affinity that can be
> > + * temporarily used for composing MSI from the hypervisor. GIC
> > + * will eventually set the right affinity for the irq and the
> > + * 'unmask' will retarget the interrupt to that cpu.
> > + */
> > + if (cpu >= cpumask_last(cpu_online_mask))
> > + cpu = 0;
> > + cpu = cpumask_next(cpu, cpu_online_mask);
> > + irq_data_update_effective_affinity(irqd, cpumask_of(cpu));
>
> The mind boggles.
>
> Let's imagine a single machine. cpu_online_mask only has bit 0 set,

single *CPU* machine

> and nr_cpumask_bits is 1. This is the first run, and cpu is 1:

cpu is *obviously* 0:

>
> cpu = cpumask_next(cpu, cpu_online_mask);
>
> cpu is now set to 1. Which is not a valid CPU number, but a valid
> return value indicating that there is no next CPU as it is equal to
> nr_cpumask_bits. cpumask_of(cpu) will then diligently return crap,
> which you carefully store into the irq descriptor. The IRQ subsystem
> thanks you.
>
> The same reasoning applies to any number of CPUs, and you obviously
> never checked what any of this does :-(. As to what the behaviour is
> when multiple CPUs run this function in parallel, let's not even
> bother (locking is overrated).
>
> Logic and concurrency issues aside, why do you even bother setting
> some round-robin affinity if all you need is to set *something* so
> that a hypervisor message can be composed? Why not use the first
> online CPU? At least it will be correct.

Everything else holds.

M.

--
Without deviation from the norm, progress is not possible.

2021-11-10 17:52:34

by Sunil Muthuswamy

[permalink] [raw]

Subject: RE: [EXTERNAL] Re: [PATCH v4 2/2] arm64: PCI: hv: Add support for Hyper-V vPCI

On Wednesday, November 10, 2021 5:26 AM,
Marc Zyngier <[email protected]> wrote:

> > On Tue, 09 Nov 2021 22:14:20 +0000,
> > Sunil Muthuswamy <[email protected]> wrote:
> > >
> > > From: Sunil Muthuswamy <[email protected]>
> > >
> > > Add support for Hyper-V vPCI for arm64 by implementing the arch specific
> > > interfaces. Introduce an IRQ domain and chip specific to Hyper-v vPCI that
> > > is based on SPIs. The IRQ domain parents itself to the arch GIC IRQ domain
> > > for basic vector management.
> > >
> > > Signed-off-by: Sunil Muthuswamy <[email protected]>
> > > ---
> > > In v2, v3 & v4:
> > > Changes are described in the cover letter.
> > >
> > > arch/arm64/include/asm/hyperv-tlfs.h | 9 ++
> > > drivers/pci/Kconfig | 2 +-
> > > drivers/pci/controller/Kconfig | 2 +-
> > > drivers/pci/controller/pci-hyperv.c | 207 ++++++++++++++++++++++++++-
> > > 4 files changed, 217 insertions(+), 3 deletions(-)
> >
> > [...]
> >
> > > +static int hv_pci_vec_irq_domain_activate(struct irq_domain *domain,
> > > + struct irq_data *irqd, bool reserve)
> > > +{
> > > + static int cpu;
> > > +
> > > + /*
> > > + * Pick a cpu using round-robin as the irq affinity that can be
> > > + * temporarily used for composing MSI from the hypervisor. GIC
> > > + * will eventually set the right affinity for the irq and the
> > > + * 'unmask' will retarget the interrupt to that cpu.
> > > + */
> > > + if (cpu >= cpumask_last(cpu_online_mask))
> > > + cpu = 0;
> > > + cpu = cpumask_next(cpu, cpu_online_mask);
> > > + irq_data_update_effective_affinity(irqd, cpumask_of(cpu));
> >
> > The mind boggles.
> >
> > Let's imagine a single machine. cpu_online_mask only has bit 0 set,
>
> single *CPU* machine
>
> > and nr_cpumask_bits is 1. This is the first run, and cpu is 1:
>
> cpu is *obviously* 0:
>
> >
> > cpu = cpumask_next(cpu, cpu_online_mask);
> >
> > cpu is now set to 1. Which is not a valid CPU number, but a valid
> > return value indicating that there is no next CPU as it is equal to
> > nr_cpumask_bits. cpumask_of(cpu) will then diligently return crap,
> > which you carefully store into the irq descriptor. The IRQ subsystem
> > thanks you.
> >
> > The same reasoning applies to any number of CPUs, and you obviously
> > never checked what any of this does :-(. As to what the behaviour is
> > when multiple CPUs run this function in parallel, let's not even
> > bother (locking is overrated).
> >
> > Logic and concurrency issues aside, why do you even bother setting
> > some round-robin affinity if all you need is to set *something* so
> > that a hypervisor message can be composed? Why not use the first
> > online CPU? At least it will be correct.
>
> Everything else holds.
>
> M.

Good call on not being able to pick cpu 0 and that being a problem for
single cpu system. The cpu initialization should have been '-1' to be able
to successfully pick cpu 0.

I don't see concurrency an issue because this was a best-case effort to
randomize the interrupt distribution across cpu's. So, even if two irq's
ended up with the same cpu, that will still work.

I also had thoughts of just using the first online cpu since this is just
temporary. So, I will go with that as that will also simplify things. Thanks
for your feedback.

- Sunil

2021-11-10 22:48:33

by Sunil Muthuswamy

[permalink] [raw]

Subject: RE: [EXTERNAL] Re: [PATCH v4 2/2] arm64: PCI: hv: Add support for Hyper-V vPCI

> > > On Tue, 09 Nov 2021 22:14:20 +0000,
> > > Sunil Muthuswamy <[email protected]> wrote:
> > > >
> > > > From: Sunil Muthuswamy <[email protected]>
> > > >
> > > > Add support for Hyper-V vPCI for arm64 by implementing the arch specific
> > > > interfaces. Introduce an IRQ domain and chip specific to Hyper-v vPCI that
> > > > is based on SPIs. The IRQ domain parents itself to the arch GIC IRQ domain
> > > > for basic vector management.
> > > >
> > > > Signed-off-by: Sunil Muthuswamy <[email protected]>
> > > > ---
> > > > In v2, v3 & v4:
> > > > Changes are described in the cover letter.
> > > >
> > > > arch/arm64/include/asm/hyperv-tlfs.h | 9 ++
> > > > drivers/pci/Kconfig | 2 +-
> > > > drivers/pci/controller/Kconfig | 2 +-
> > > > drivers/pci/controller/pci-hyperv.c | 207 ++++++++++++++++++++++++++-
> > > > 4 files changed, 217 insertions(+), 3 deletions(-)
> > >
> > > [...]
> > >
> > > > +static int hv_pci_vec_irq_domain_activate(struct irq_domain *domain,
> > > > + struct irq_data *irqd, bool reserve)
> > > > +{
> > > > + static int cpu;
> > > > +
> > > > + /*
> > > > + * Pick a cpu using round-robin as the irq affinity that can be
> > > > + * temporarily used for composing MSI from the hypervisor. GIC
> > > > + * will eventually set the right affinity for the irq and the
> > > > + * 'unmask' will retarget the interrupt to that cpu.
> > > > + */
> > > > + if (cpu >= cpumask_last(cpu_online_mask))
> > > > + cpu = 0;
> > > > + cpu = cpumask_next(cpu, cpu_online_mask);
> > > > + irq_data_update_effective_affinity(irqd, cpumask_of(cpu));
> > >
> > > The mind boggles.
> > >
> > > Let's imagine a single machine. cpu_online_mask only has bit 0 set,
> >
> > single *CPU* machine
> >
> > > and nr_cpumask_bits is 1. This is the first run, and cpu is 1:
> >
> > cpu is *obviously* 0:
> >
> > >
> > > cpu = cpumask_next(cpu, cpu_online_mask);
> > >
> > > cpu is now set to 1. Which is not a valid CPU number, but a valid
> > > return value indicating that there is no next CPU as it is equal to
> > > nr_cpumask_bits. cpumask_of(cpu) will then diligently return crap,
> > > which you carefully store into the irq descriptor. The IRQ subsystem
> > > thanks you.
> > >
> > > The same reasoning applies to any number of CPUs, and you obviously
> > > never checked what any of this does :-(. As to what the behaviour is
> > > when multiple CPUs run this function in parallel, let's not even
> > > bother (locking is overrated).
> > >
> > > Logic and concurrency issues aside, why do you even bother setting
> > > some round-robin affinity if all you need is to set *something* so
> > > that a hypervisor message can be composed? Why not use the first
> > > online CPU? At least it will be correct.
> >
> > Everything else holds.
> >
> > M.
>
> Good call on not being able to pick cpu 0 and that being a problem for
> single cpu system. The cpu initialization should have been '-1' to be able
> to successfully pick cpu 0.
>
> I don't see concurrency an issue because this was a best-case effort to
> randomize the interrupt distribution across cpu's. So, even if two irq's
> ended up with the same cpu, that will still work.
>
> I also had thoughts of just using the first online cpu since this is just
> temporary. So, I will go with that as that will also simplify things. Thanks
> for your feedback.
>
> - Sunil

But, yes, for the concurrency, I do see a possibility of a race condition
with the last cpu check and 'cpumask_next' call where it could lead
to a failure. v5 moves this to the first online cpu and that should
fix this issue.

- Sunil