2021-02-03 15:10:53

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 00/16] Introducing Linux root partition support for Microsoft Hypervisor

Hi all

Here we propose this patch series to make Linux run as the root partition [0]
on Microsoft Hypervisor [1]. There will be a subsequent patch series to provide a
device node (/dev/mshv) such that userspace programs can create and run virtual
machines. We've also ported Cloud Hypervisor [3] over and have been able to
boot a Linux guest with Virtio devices since late July 2020.

This series implements only the absolutely necessary components to get
things running. A large portion of this series consists of patches that
augment hyperv-tlfs.h. They should be rather uncontroversial and can be
applied right away.

A few key things other than the changes to hyperv-tlfs.h:

1. Linux needs to setup existing Hyper-V facilities differently.
2. Linux needs to make a few hypercalls to bring up APs.
3. Interrupts are remapped by IOMMU, which is controlled by the hypervisor.
Linux needs to make hypercalls to map and unmap interrupts. This is
done by introducing a new MSI irqdomain and extending the remapping
domain in hyperv-iommu.

This series is now based on 5.11-rc2.

Comments and suggestions are welcome.

Thanks,
Wei.

[0] Just think of it like Xen's Dom0.
[1] Hyper-V is more well-known, but it really refers to the whole stack
including the hypervisor and other components that run in Windows kernel
and userspace.
[3] https://github.com/cloud-hypervisor/

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Changes since v5:
1. Address Michael's comments.
2. Further improve and simplify code.
3. Drop a redundant patch and add one new patch for ACPI / NUMA code.

Changes since v4:
1. Rework IO-APIC handling.

Changes since v3:
1. Fix compilation errors.
2. Adapt to upstream changes.

Changes since v2:
1. Address more comments from Vitaly.
2. Fix and test 32bit build.

Changes since v1:
1. Simplify MSI IRQ domain implementation.
2. Address Vitaly's comments.

Wei Liu (16):
asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to
HV_CPU_MANAGEMENT
x86/hyperv: detect if Linux is the root partition
Drivers: hv: vmbus: skip VMBus initialization if Linux is root
clocksource/hyperv: use MSR-based access if running as root
x86/hyperv: allocate output arg pages if required
x86/hyperv: extract partition ID from Microsoft Hypervisor if
necessary
x86/hyperv: handling hypercall page setup for root
ACPI / NUMA: add a stub function for node_to_pxm()
x86/hyperv: provide a bunch of helper functions
x86/hyperv: implement and use hv_smp_prepare_cpus
asm-generic/hyperv: update hv_msi_entry
asm-generic/hyperv: update hv_interrupt_entry
asm-generic/hyperv: introduce hv_device_id and auxiliary structures
asm-generic/hyperv: import data structures for mapping device
interrupts
x86/hyperv: implement an MSI domain for root partition
iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

arch/x86/hyperv/Makefile | 4 +-
arch/x86/hyperv/hv_init.c | 107 +++++++-
arch/x86/hyperv/hv_proc.c | 219 ++++++++++++++++
arch/x86/hyperv/irqdomain.c | 387 ++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 23 ++
arch/x86/include/asm/mshyperv.h | 19 +-
arch/x86/kernel/cpu/mshyperv.c | 49 ++++
drivers/clocksource/hyperv_timer.c | 3 +
drivers/hv/vmbus_drv.c | 3 +
drivers/iommu/hyperv-iommu.c | 177 ++++++++++++-
drivers/pci/controller/pci-hyperv.c | 2 +-
include/acpi/acpi_numa.h | 4 +
include/asm-generic/hyperv-tlfs.h | 254 +++++++++++++++++-
13 files changed, 1230 insertions(+), 21 deletions(-)
create mode 100644 arch/x86/hyperv/hv_proc.c
create mode 100644 arch/x86/hyperv/irqdomain.c


base-commit: e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
--
2.20.1


2021-02-03 15:10:55

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 06/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary

We will need the partition ID for executing some hypercalls later.

Signed-off-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
---
v6:
1. Use u64 status.

v3:
1. Make hv_get_partition_id static.
2. Change code structure a bit.
---
arch/x86/hyperv/hv_init.c | 26 ++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 2 ++
include/asm-generic/hyperv-tlfs.h | 6 ++++++
3 files changed, 34 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6f4cb40e53fe..5b90a7290177 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -26,6 +26,9 @@
#include <linux/syscore_ops.h>
#include <clocksource/hyperv_timer.h>

+u64 hv_current_partition_id = ~0ull;
+EXPORT_SYMBOL_GPL(hv_current_partition_id);
+
void *hv_hypercall_pg;
EXPORT_SYMBOL_GPL(hv_hypercall_pg);

@@ -331,6 +334,24 @@ static struct syscore_ops hv_syscore_ops = {
.resume = hv_resume,
};

+static void __init hv_get_partition_id(void)
+{
+ struct hv_get_partition_id *output_page;
+ u64 status;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
+ status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page);
+ if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
+ /* No point in proceeding if this failed */
+ pr_err("Failed to get partition ID: %lld\n", status);
+ BUG();
+ }
+ hv_current_partition_id = output_page->partition_id;
+ local_irq_restore(flags);
+}
+
/*
* This function is to be invoked early in the boot sequence after the
* hypervisor has been detected.
@@ -426,6 +447,11 @@ void __init hyperv_init(void)

register_syscore_ops(&hv_syscore_ops);

+ if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
+ hv_get_partition_id();
+
+ BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
+
return;

remove_cpuhp_state:
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 62d9390f1ddf..67f5d35a73d3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -78,6 +78,8 @@ extern void *hv_hypercall_pg;
extern void __percpu **hyperv_pcpu_input_arg;
extern void __percpu **hyperv_pcpu_output_arg;

+extern u64 hv_current_partition_id;
+
static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
{
u64 input_address = input ? virt_to_phys(input) : 0;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index e6903589a82a..87b1a79b19eb 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
+#define HVCALL_GET_PARTITION_ID 0x0046
#define HVCALL_GET_VP_REGISTERS 0x0050
#define HVCALL_SET_VP_REGISTERS 0x0051
#define HVCALL_POST_MESSAGE 0x005c
@@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
u64 gva_list[];
} __packed;

+/* HvGetPartitionId hypercall (output only) */
+struct hv_get_partition_id {
+ u64 partition_id;
+} __packed;
+
/* HvRetargetDeviceInterrupt hypercall */
union hv_msi_entry {
u64 as_uint64;
--
2.20.1

2021-02-03 15:10:59

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

When Linux runs as the root partition on Microsoft Hypervisor, its
interrupts are remapped. Linux will need to explicitly map and unmap
interrupts for hardware.

Implement an MSI domain to issue the correct hypercalls. And initialize
this irqdomain as the default MSI irq domain.

Signed-off-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
---
v6:
1. Use u64 status.
2. Use vpset instead of bitmap.
3. Factor out hv_map_interrupt
4. Address other misc comments.

v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
v3: build irqdomain.o for 32bit as well.
v2: This patch is simplified due to upstream changes.
---
arch/x86/hyperv/Makefile | 2 +-
arch/x86/hyperv/hv_init.c | 9 +
arch/x86/hyperv/irqdomain.c | 362 ++++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 2 +
4 files changed, 374 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/hyperv/irqdomain.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 565358020921..48e2c51464e8 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-y := hv_init.o mmu.o nested.o
+obj-y := hv_init.o mmu.o nested.o irqdomain.o
obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o

ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 11c5997691f4..894ce899f0cb 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -483,6 +483,15 @@ void __init hyperv_init(void)

BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);

+#ifdef CONFIG_PCI_MSI
+ /*
+ * If we're running as root, we want to create our own PCI MSI domain.
+ * We can't set this in hv_pci_init because that would be too late.
+ */
+ if (hv_root_partition)
+ x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
+#endif
+
return;

remove_cpuhp_state:
diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
new file mode 100644
index 000000000000..117f17e8c88a
--- /dev/null
+++ b/arch/x86/hyperv/irqdomain.c
@@ -0,0 +1,362 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * for Linux to run as the root partition on Microsoft Hypervisor.
+ *
+ * Authors:
+ * Sunil Muthuswamy <[email protected]>
+ * Wei Liu <[email protected]>
+ */
+
+#include <linux/pci.h>
+#include <linux/irq.h>
+#include <asm/mshyperv.h>
+
+static int hv_map_interrupt(union hv_device_id device_id, bool level,
+ int cpu, int vector, struct hv_interrupt_entry *entry)
+{
+ struct hv_input_map_device_interrupt *input;
+ struct hv_output_map_device_interrupt *output;
+ struct hv_device_interrupt_descriptor *intr_desc;
+ unsigned long flags;
+ u64 status;
+ cpumask_t mask = CPU_MASK_NONE;
+ int nr_bank, var_size;
+
+ local_irq_save(flags);
+
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+
+ intr_desc = &input->interrupt_descriptor;
+ memset(input, 0, sizeof(*input));
+ input->partition_id = hv_current_partition_id;
+ input->device_id = device_id.as_uint64;
+ intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
+ intr_desc->vector_count = 1;
+ intr_desc->target.vector = vector;
+
+ if (level)
+ intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
+ else
+ intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
+
+ cpumask_set_cpu(cpu, &mask);
+ intr_desc->target.vp_set.valid_bank_mask = 0;
+ intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
+ if (nr_bank < 0) {
+ local_irq_restore(flags);
+ pr_err("%s: unable to generate VP set\n", __func__);
+ return EINVAL;
+ }
+ intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
+
+ /*
+ * var-sized hypercall, var-size starts after vp_mask (thus
+ * vp_set.format does not count, but vp_set.valid_bank_mask
+ * does).
+ */
+ var_size = nr_bank + 1;
+
+ status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, var_size,
+ input, output);
+ *entry = output->interrupt_entry;
+
+ local_irq_restore(flags);
+
+ if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS)
+ pr_err("%s: hypercall failed, status %lld\n", __func__, status);
+
+ return status & HV_HYPERCALL_RESULT_MASK;
+}
+
+static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
+{
+ unsigned long flags;
+ struct hv_input_unmap_device_interrupt *input;
+ struct hv_interrupt_entry *intr_entry;
+ u64 status;
+
+ local_irq_save(flags);
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ memset(input, 0, sizeof(*input));
+ intr_entry = &input->interrupt_entry;
+ input->partition_id = hv_current_partition_id;
+ input->device_id = id;
+ *intr_entry = *old_entry;
+
+ status = hv_do_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, input, NULL);
+ local_irq_restore(flags);
+
+ return status & HV_HYPERCALL_RESULT_MASK;
+}
+
+#ifdef CONFIG_PCI_MSI
+struct rid_data {
+ struct pci_dev *bridge;
+ u32 rid;
+};
+
+static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
+{
+ struct rid_data *rd = data;
+ u8 bus = PCI_BUS_NUM(rd->rid);
+
+ if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) {
+ rd->bridge = pdev;
+ rd->rid = alias;
+ }
+
+ return 0;
+}
+
+static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
+{
+ union hv_device_id dev_id;
+ struct rid_data data = {
+ .bridge = NULL,
+ .rid = PCI_DEVID(dev->bus->number, dev->devfn)
+ };
+
+ pci_for_each_dma_alias(dev, get_rid_cb, &data);
+
+ dev_id.as_uint64 = 0;
+ dev_id.device_type = HV_DEVICE_TYPE_PCI;
+ dev_id.pci.segment = pci_domain_nr(dev->bus);
+
+ dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
+ dev_id.pci.bdf.device = PCI_SLOT(data.rid);
+ dev_id.pci.bdf.function = PCI_FUNC(data.rid);
+ dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
+
+ if (data.bridge) {
+ int pos;
+
+ /*
+ * Microsoft Hypervisor requires a bus range when the bridge is
+ * running in PCI-X mode.
+ *
+ * To distinguish conventional vs PCI-X bridge, we can check
+ * the bridge's PCI-X Secondary Status Register, Secondary Bus
+ * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge
+ * Specification Revision 1.0 5.2.2.1.3.
+ *
+ * Value zero means it is in conventional mode, otherwise it is
+ * in PCI-X mode.
+ */
+
+ pos = pci_find_capability(data.bridge, PCI_CAP_ID_PCIX);
+ if (pos) {
+ u16 status;
+
+ pci_read_config_word(data.bridge, pos +
+ PCI_X_BRIDGE_SSTATUS, &status);
+
+ if (status & PCI_X_SSTATUS_FREQ) {
+ /* Non-zero, PCI-X mode */
+ u8 sec_bus, sub_bus;
+
+ dev_id.pci.source_shadow = HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE;
+
+ pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, &sec_bus);
+ dev_id.pci.shadow_bus_range.secondary_bus = sec_bus;
+ pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, &sub_bus);
+ dev_id.pci.shadow_bus_range.subordinate_bus = sub_bus;
+ }
+ }
+ }
+
+ return dev_id;
+}
+
+static int hv_map_msi_interrupt(struct pci_dev *dev, int cpu, int vector,
+ struct hv_interrupt_entry *entry)
+{
+ union hv_device_id device_id = hv_build_pci_dev_id(dev);
+
+ return hv_map_interrupt(device_id, false, cpu, vector, entry);
+}
+
+static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi_msg *msg)
+{
+ /* High address is always 0 */
+ msg->address_hi = 0;
+ msg->address_lo = entry->msi_entry.address.as_uint32;
+ msg->data = entry->msi_entry.data.as_uint32;
+}
+
+static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry);
+static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+ struct msi_desc *msidesc;
+ struct pci_dev *dev;
+ struct hv_interrupt_entry out_entry, *stored_entry;
+ struct irq_cfg *cfg = irqd_cfg(data);
+ cpumask_t *affinity;
+ int cpu;
+ u64 status;
+
+ msidesc = irq_data_get_msi_desc(data);
+ dev = msi_desc_to_pci_dev(msidesc);
+
+ if (!cfg) {
+ pr_debug("%s: cfg is NULL", __func__);
+ return;
+ }
+
+ affinity = irq_data_get_effective_affinity_mask(data);
+ cpu = cpumask_first_and(affinity, cpu_online_mask);
+
+ if (data->chip_data) {
+ /*
+ * This interrupt is already mapped. Let's unmap first.
+ *
+ * We don't use retarget interrupt hypercalls here because
+ * Microsoft Hypervisor doens't allow root to change the vector
+ * or specify VPs outside of the set that is initially used
+ * during mapping.
+ */
+ stored_entry = data->chip_data;
+ data->chip_data = NULL;
+
+ status = hv_unmap_msi_interrupt(dev, stored_entry);
+
+ kfree(stored_entry);
+
+ if (status != HV_STATUS_SUCCESS) {
+ pr_debug("%s: failed to unmap, status %lld", __func__, status);
+ return;
+ }
+ }
+
+ stored_entry = kzalloc(sizeof(*stored_entry), GFP_ATOMIC);
+ if (!stored_entry) {
+ pr_debug("%s: failed to allocate chip data\n", __func__);
+ return;
+ }
+
+ status = hv_map_msi_interrupt(dev, cpu, cfg->vector, &out_entry);
+ if (status != HV_STATUS_SUCCESS) {
+ kfree(stored_entry);
+ return;
+ }
+
+ *stored_entry = out_entry;
+ data->chip_data = stored_entry;
+ entry_to_msi_msg(&out_entry, msg);
+
+ return;
+}
+
+static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry)
+{
+ return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry);
+}
+
+static void hv_teardown_msi_irq_common(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
+{
+ u64 status;
+ struct hv_interrupt_entry old_entry;
+ struct irq_desc *desc;
+ struct irq_data *data;
+ struct msi_msg msg;
+
+ desc = irq_to_desc(irq);
+ if (!desc) {
+ pr_debug("%s: no irq desc\n", __func__);
+ return;
+ }
+
+ data = &desc->irq_data;
+ if (!data) {
+ pr_debug("%s: no irq data\n", __func__);
+ return;
+ }
+
+ if (!data->chip_data) {
+ pr_debug("%s: no chip data\n!", __func__);
+ return;
+ }
+
+ old_entry = *(struct hv_interrupt_entry *)data->chip_data;
+ entry_to_msi_msg(&old_entry, &msg);
+
+ kfree(data->chip_data);
+ data->chip_data = NULL;
+
+ status = hv_unmap_msi_interrupt(dev, &old_entry);
+
+ if (status != HV_STATUS_SUCCESS) {
+ pr_err("%s: hypercall failed, status %lld\n", __func__, status);
+ return;
+ }
+}
+
+static void hv_msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
+{
+ int i;
+ struct msi_desc *entry;
+ struct pci_dev *pdev;
+
+ if (WARN_ON_ONCE(!dev_is_pci(dev)))
+ return;
+
+ pdev = to_pci_dev(dev);
+
+ for_each_pci_msi_entry(entry, pdev) {
+ if (entry->irq) {
+ for (i = 0; i < entry->nvec_used; i++) {
+ hv_teardown_msi_irq_common(pdev, entry, entry->irq + i);
+ irq_domain_free_irqs(entry->irq + i, 1);
+ }
+ }
+ }
+}
+
+/*
+ * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI or MSI-X Capability Structure.
+ */
+static struct irq_chip hv_pci_msi_controller = {
+ .name = "HV-PCI-MSI",
+ .irq_unmask = pci_msi_unmask_irq,
+ .irq_mask = pci_msi_mask_irq,
+ .irq_ack = irq_chip_ack_parent,
+ .irq_retrigger = irq_chip_retrigger_hierarchy,
+ .irq_compose_msi_msg = hv_irq_compose_msi_msg,
+ .irq_set_affinity = msi_domain_set_affinity,
+ .flags = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static struct msi_domain_ops pci_msi_domain_ops = {
+ .domain_free_irqs = hv_msi_domain_free_irqs,
+ .msi_prepare = pci_msi_prepare,
+};
+
+static struct msi_domain_info hv_pci_msi_domain_info = {
+ .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+ MSI_FLAG_PCI_MSIX,
+ .ops = &pci_msi_domain_ops,
+ .chip = &hv_pci_msi_controller,
+ .handler = handle_edge_irq,
+ .handler_name = "edge",
+};
+
+struct irq_domain * __init hv_create_pci_msi_domain(void)
+{
+ struct irq_domain *d = NULL;
+ struct fwnode_handle *fn;
+
+ fn = irq_domain_alloc_named_fwnode("HV-PCI-MSI");
+ if (fn)
+ d = pci_msi_create_irq_domain(fn, &hv_pci_msi_domain_info, x86_vector_domain);
+
+ /* No point in going further if we can't get an irq domain */
+ BUG_ON(!d);
+
+ return d;
+}
+
+#endif /* CONFIG_PCI_MSI */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index cbee72550a12..ccc849e25d5e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -261,6 +261,8 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
msi_entry->data.as_uint32 = msi_desc->msg.data;
}

+struct irq_domain *hv_create_pci_msi_domain(void);
+
#else /* CONFIG_HYPERV */
static inline void hyperv_init(void) {}
static inline void hyperv_setup_mmu_ops(void) {}
--
2.20.1

2021-02-03 15:13:17

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 14/16] asm-generic/hyperv: import data structures for mapping device interrupts

Signed-off-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
arch/x86/include/asm/hyperv-tlfs.h | 13 +++++++++++
include/asm-generic/hyperv-tlfs.h | 36 ++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 204010350604..ab7d6cde548d 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -533,6 +533,19 @@ struct hv_partition_assist_pg {
u32 tlb_lock_count;
};

+enum hv_interrupt_type {
+ HV_X64_INTERRUPT_TYPE_FIXED = 0x0000,
+ HV_X64_INTERRUPT_TYPE_LOWESTPRIORITY = 0x0001,
+ HV_X64_INTERRUPT_TYPE_SMI = 0x0002,
+ HV_X64_INTERRUPT_TYPE_REMOTEREAD = 0x0003,
+ HV_X64_INTERRUPT_TYPE_NMI = 0x0004,
+ HV_X64_INTERRUPT_TYPE_INIT = 0x0005,
+ HV_X64_INTERRUPT_TYPE_SIPI = 0x0006,
+ HV_X64_INTERRUPT_TYPE_EXTINT = 0x0007,
+ HV_X64_INTERRUPT_TYPE_LOCALINT0 = 0x0008,
+ HV_X64_INTERRUPT_TYPE_LOCALINT1 = 0x0009,
+ HV_X64_INTERRUPT_TYPE_MAXIMUM = 0x000A,
+};

#include <asm-generic/hyperv-tlfs.h>

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index ce53c0db28ae..a2eaed1b79e5 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -152,6 +152,8 @@ struct ms_hyperv_tsc_page {
#define HVCALL_RETRIEVE_DEBUG_DATA 0x006a
#define HVCALL_RESET_DEBUG_SESSION 0x006b
#define HVCALL_ADD_LOGICAL_PROCESSOR 0x0076
+#define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
+#define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
#define HVCALL_RETARGET_INTERRUPT 0x007e
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
@@ -702,4 +704,38 @@ union hv_device_id {
} acpi;
} __packed;

+enum hv_interrupt_trigger_mode {
+ HV_INTERRUPT_TRIGGER_MODE_EDGE = 0,
+ HV_INTERRUPT_TRIGGER_MODE_LEVEL = 1,
+};
+
+struct hv_device_interrupt_descriptor {
+ u32 interrupt_type;
+ u32 trigger_mode;
+ u32 vector_count;
+ u32 reserved;
+ struct hv_device_interrupt_target target;
+} __packed;
+
+struct hv_input_map_device_interrupt {
+ u64 partition_id;
+ u64 device_id;
+ u64 flags;
+ struct hv_interrupt_entry logical_interrupt_entry;
+ struct hv_device_interrupt_descriptor interrupt_descriptor;
+} __packed;
+
+struct hv_output_map_device_interrupt {
+ struct hv_interrupt_entry interrupt_entry;
+} __packed;
+
+struct hv_input_unmap_device_interrupt {
+ u64 partition_id;
+ u64 device_id;
+ struct hv_interrupt_entry interrupt_entry;
+} __packed;
+
+#define HV_SOURCE_SHADOW_NONE 0x0
+#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1
+
#endif
--
2.20.1

2021-02-03 15:13:33

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
Hypervisor when Linux runs as the root partition. Implement an IRQ
domain to handle mapping and unmapping of IO-APIC interrupts.

Signed-off-by: Wei Liu <[email protected]>
---
v6:
1. Simplify code due to changes in a previous patch.
---
arch/x86/hyperv/irqdomain.c | 25 +++++
arch/x86/include/asm/mshyperv.h | 4 +
drivers/iommu/hyperv-iommu.c | 177 +++++++++++++++++++++++++++++++-
3 files changed, 203 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 117f17e8c88a..0cabc9aece38 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -360,3 +360,28 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
}

#endif /* CONFIG_PCI_MSI */
+
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
+{
+ union hv_device_id device_id;
+
+ device_id.as_uint64 = 0;
+ device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+ device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+ return hv_unmap_interrupt(device_id.as_uint64, entry);
+}
+EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
+
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector,
+ struct hv_interrupt_entry *entry)
+{
+ union hv_device_id device_id;
+
+ device_id.as_uint64 = 0;
+ device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+ device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+ return hv_map_interrupt(device_id, level, cpu, vector, entry);
+}
+EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccc849e25d5e..345d7c6f8c37 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,

struct irq_domain *hv_create_pci_msi_domain(void);

+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+ struct hv_interrupt_entry *entry);
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+
#else /* CONFIG_HYPERV */
static inline void hyperv_init(void) {}
static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1d21a0b5f724..e285a220c913 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>
#include <asm/hypervisor.h>
+#include <asm/mshyperv.h>

#include "irq_remapping.h"

@@ -115,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
.free = hyperv_irq_remapping_free,
};

+static const struct irq_domain_ops hyperv_root_ir_domain_ops;
static int __init hyperv_prepare_irq_remapping(void)
{
struct fwnode_handle *fn;
int i;
+ const char *name;
+ const struct irq_domain_ops *ops;

if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
x86_init.hyper.msi_ext_dest_id() ||
!x2apic_supported())
return -ENODEV;

- fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
+ if (hv_root_partition) {
+ name = "HYPERV-ROOT-IR";
+ ops = &hyperv_root_ir_domain_ops;
+ } else {
+ name = "HYPERV-IR";
+ ops = &hyperv_ir_domain_ops;
+ }
+
+ fn = irq_domain_alloc_named_id_fwnode(name, 0);
if (!fn)
return -ENOMEM;

ioapic_ir_domain =
irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
- 0, IOAPIC_REMAPPING_ENTRY, fn,
- &hyperv_ir_domain_ops, NULL);
+ 0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);

if (!ioapic_ir_domain) {
irq_domain_free_fwnode(fn);
return -ENOMEM;
}

+ if (hv_root_partition)
+ return 0; /* The rest is only relevant to guests */
+
/*
* Hyper-V doesn't provide irq remapping function for
* IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
@@ -166,4 +180,161 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
.enable = hyperv_enable_irq_remapping,
};

+/* IRQ remapping domain when Linux runs as the root partition */
+struct hyperv_root_ir_data {
+ u8 ioapic_id;
+ bool is_level;
+ struct hv_interrupt_entry entry;
+};
+
+static void
+hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
+{
+ u64 status;
+ u32 vector;
+ struct irq_cfg *cfg;
+ int ioapic_id;
+ struct cpumask *affinity;
+ int cpu;
+ struct hv_interrupt_entry entry;
+ struct hyperv_root_ir_data *data = irq_data->chip_data;
+ struct IO_APIC_route_entry e;
+
+ cfg = irqd_cfg(irq_data);
+ affinity = irq_data_get_effective_affinity_mask(irq_data);
+ cpu = cpumask_first_and(affinity, cpu_online_mask);
+
+ vector = cfg->vector;
+ ioapic_id = data->ioapic_id;
+
+ if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
+ && data->entry.ioapic_rte.as_uint64) {
+ entry = data->entry;
+
+ status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
+
+ if (status != HV_STATUS_SUCCESS)
+ pr_debug("%s: unexpected unmap status %lld\n", __func__, status);
+
+ data->entry.ioapic_rte.as_uint64 = 0;
+ data->entry.source = 0; /* Invalid source */
+ }
+
+
+ status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, cpu,
+ vector, &entry);
+
+ if (status != HV_STATUS_SUCCESS) {
+ pr_err("%s: map hypercall failed, status %lld\n", __func__, status);
+ return;
+ }
+
+ data->entry = entry;
+
+ /* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
+ e.w1 = entry.ioapic_rte.low_uint32;
+ e.w2 = entry.ioapic_rte.high_uint32;
+
+ memset(msg, 0, sizeof(*msg));
+ msg->arch_data.vector = e.vector;
+ msg->arch_data.delivery_mode = e.delivery_mode;
+ msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
+ msg->arch_addr_lo.dmar_format = e.ir_format;
+ msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
+}
+
+static int hyperv_root_ir_set_affinity(struct irq_data *data,
+ const struct cpumask *mask, bool force)
+{
+ struct irq_data *parent = data->parent_data;
+ struct irq_cfg *cfg = irqd_cfg(data);
+ int ret;
+
+ ret = parent->chip->irq_set_affinity(parent, mask, force);
+ if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+ return ret;
+
+ send_cleanup_vector(cfg);
+
+ return 0;
+}
+
+static struct irq_chip hyperv_root_ir_chip = {
+ .name = "HYPERV-ROOT-IR",
+ .irq_ack = apic_ack_irq,
+ .irq_set_affinity = hyperv_root_ir_set_affinity,
+ .irq_compose_msi_msg = hyperv_root_ir_compose_msi_msg,
+};
+
+static int hyperv_root_irq_remapping_alloc(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs,
+ void *arg)
+{
+ struct irq_alloc_info *info = arg;
+ struct irq_data *irq_data;
+ struct hyperv_root_ir_data *data;
+ int ret = 0;
+
+ if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
+ return -EINVAL;
+
+ ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+ if (ret < 0)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data) {
+ irq_domain_free_irqs_common(domain, virq, nr_irqs);
+ return -ENOMEM;
+ }
+
+ irq_data = irq_domain_get_irq_data(domain, virq);
+ if (!irq_data) {
+ kfree(data);
+ irq_domain_free_irqs_common(domain, virq, nr_irqs);
+ return -EINVAL;
+ }
+
+ data->ioapic_id = info->devid;
+ data->is_level = info->ioapic.is_level;
+
+ irq_data->chip = &hyperv_root_ir_chip;
+ irq_data->chip_data = data;
+
+ return 0;
+}
+
+static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs)
+{
+ struct irq_data *irq_data;
+ struct hyperv_root_ir_data *data;
+ struct hv_interrupt_entry *e;
+ int i;
+
+ for (i = 0; i < nr_irqs; i++) {
+ irq_data = irq_domain_get_irq_data(domain, virq + i);
+
+ if (irq_data && irq_data->chip_data) {
+ data = irq_data->chip_data;
+ e = &data->entry;
+
+ if (e->source == HV_DEVICE_TYPE_IOAPIC
+ && e->ioapic_rte.as_uint64)
+ hv_unmap_ioapic_interrupt(data->ioapic_id,
+ &data->entry);
+
+ kfree(data);
+ }
+ }
+
+ irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
+ .select = hyperv_irq_remapping_select,
+ .alloc = hyperv_root_irq_remapping_alloc,
+ .free = hyperv_root_irq_remapping_free,
+};
+
#endif
--
2.20.1

2021-02-03 15:13:45

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures

We will need to identify the device we want Microsoft Hypervisor to
manipulate. Introduce the data structures for that purpose.

They will be used in a later patch.

Signed-off-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
---
v6:
1. Add reserved0 as field name.
---
include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 94c7d77bbf68..ce53c0db28ae 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
} element[];
} __packed;

+enum hv_device_type {
+ HV_DEVICE_TYPE_LOGICAL = 0,
+ HV_DEVICE_TYPE_PCI = 1,
+ HV_DEVICE_TYPE_IOAPIC = 2,
+ HV_DEVICE_TYPE_ACPI = 3,
+};
+
+typedef u16 hv_pci_rid;
+typedef u16 hv_pci_segment;
+typedef u64 hv_logical_device_id;
+union hv_pci_bdf {
+ u16 as_uint16;
+
+ struct {
+ u8 function:3;
+ u8 device:5;
+ u8 bus;
+ };
+} __packed;
+
+union hv_pci_bus_range {
+ u16 as_uint16;
+
+ struct {
+ u8 subordinate_bus;
+ u8 secondary_bus;
+ };
+} __packed;
+
+union hv_device_id {
+ u64 as_uint64;
+
+ struct {
+ u64 reserved0:62;
+ u64 device_type:2;
+ };
+
+ /* HV_DEVICE_TYPE_LOGICAL */
+ struct {
+ u64 id:62;
+ u64 device_type:2;
+ } logical;
+
+ /* HV_DEVICE_TYPE_PCI */
+ struct {
+ union {
+ hv_pci_rid rid;
+ union hv_pci_bdf bdf;
+ };
+
+ hv_pci_segment segment;
+ union hv_pci_bus_range shadow_bus_range;
+
+ u16 phantom_function_bits:2;
+ u16 source_shadow:1;
+
+ u16 rsvdz0:11;
+ u16 device_type:2;
+ } pci;
+
+ /* HV_DEVICE_TYPE_IOAPIC */
+ struct {
+ u8 ioapic_id;
+ u8 rsvdz0;
+ u16 rsvdz1;
+ u16 rsvdz2;
+
+ u16 rsvdz3:14;
+ u16 device_type:2;
+ } ioapic;
+
+ /* HV_DEVICE_TYPE_ACPI */
+ struct {
+ u32 input_mapping_base;
+ u32 input_mapping_count:30;
+ u32 device_type:2;
+ } acpi;
+} __packed;
+
#endif
--
2.20.1

2021-02-03 15:13:46

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 11/16] asm-generic/hyperv: update hv_msi_entry

We will soon need to access fields inside the MSI address and MSI data
fields. Introduce hv_msi_address_register and hv_msi_data_register.

Fix up one user of hv_msi_entry in mshyperv.h.

No functional change expected.

Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
arch/x86/include/asm/mshyperv.h | 4 ++--
include/asm-generic/hyperv-tlfs.h | 28 ++++++++++++++++++++++++++--
2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 4e590a167160..cbee72550a12 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -257,8 +257,8 @@ static inline void hv_apic_init(void) {}
static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
struct msi_desc *msi_desc)
{
- msi_entry->address = msi_desc->msg.address_lo;
- msi_entry->data = msi_desc->msg.data;
+ msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
+ msi_entry->data.as_uint32 = msi_desc->msg.data;
}

#else /* CONFIG_HYPERV */
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 69de4e3d89d3..4669f9a4e1f1 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -480,12 +480,36 @@ struct hv_create_vp {
u64 flags;
} __packed;

+union hv_msi_address_register {
+ u32 as_uint32;
+ struct {
+ u32 reserved1:2;
+ u32 destination_mode:1;
+ u32 redirection_hint:1;
+ u32 reserved2:8;
+ u32 destination_id:8;
+ u32 msi_base:12;
+ };
+} __packed;
+
+union hv_msi_data_register {
+ u32 as_uint32;
+ struct {
+ u32 vector:8;
+ u32 delivery_mode:3;
+ u32 reserved1:3;
+ u32 level_assert:1;
+ u32 trigger_mode:1;
+ u32 reserved2:16;
+ };
+} __packed;
+
/* HvRetargetDeviceInterrupt hypercall */
union hv_msi_entry {
u64 as_uint64;
struct {
- u32 address;
- u32 data;
+ union hv_msi_address_register address;
+ union hv_msi_data_register data;
} __packed;
};

--
2.20.1

2021-02-03 15:15:35

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 12/16] asm-generic/hyperv: update hv_interrupt_entry

We will soon use the same structure to handle IO-APIC interrupts as
well. Introduce an enum to identify the source and a data structure for
IO-APIC RTE.

While at it, update pci-hyperv.c to use the enum.

No functional change.

Signed-off-by: Wei Liu <[email protected]>
Acked-by: Rob Herring <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
drivers/pci/controller/pci-hyperv.c | 2 +-
include/asm-generic/hyperv-tlfs.h | 36 +++++++++++++++++++++++++++--
2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 6db8d96a78eb..87aa62ee0368 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1216,7 +1216,7 @@ static void hv_irq_unmask(struct irq_data *data)
params = &hbus->retarget_msi_interrupt_params;
memset(params, 0, sizeof(*params));
params->partition_id = HV_PARTITION_ID_SELF;
- params->int_entry.source = 1; /* MSI(-X) */
+ params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
hv_set_msi_entry_from_desc(&params->int_entry.msi_entry, msi_desc);
params->device_id = (hbus->hdev->dev_instance.b[5] << 24) |
(hbus->hdev->dev_instance.b[4] << 16) |
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 4669f9a4e1f1..94c7d77bbf68 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -480,6 +480,11 @@ struct hv_create_vp {
u64 flags;
} __packed;

+enum hv_interrupt_source {
+ HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */
+ HV_INTERRUPT_SOURCE_IOAPIC,
+};
+
union hv_msi_address_register {
u32 as_uint32;
struct {
@@ -513,10 +518,37 @@ union hv_msi_entry {
} __packed;
};

+union hv_ioapic_rte {
+ u64 as_uint64;
+
+ struct {
+ u32 vector:8;
+ u32 delivery_mode:3;
+ u32 destination_mode:1;
+ u32 delivery_status:1;
+ u32 interrupt_polarity:1;
+ u32 remote_irr:1;
+ u32 trigger_mode:1;
+ u32 interrupt_mask:1;
+ u32 reserved1:15;
+
+ u32 reserved2:24;
+ u32 destination_id:8;
+ };
+
+ struct {
+ u32 low_uint32;
+ u32 high_uint32;
+ };
+} __packed;
+
struct hv_interrupt_entry {
- u32 source; /* 1 for MSI(-X) */
+ u32 source;
u32 reserved1;
- union hv_msi_entry msi_entry;
+ union {
+ union hv_msi_entry msi_entry;
+ union hv_ioapic_rte ioapic_rte;
+ };
} __packed;

/*
--
2.20.1

2021-02-03 15:16:24

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions

They are used to deposit pages into Microsoft Hypervisor and bring up
logical and virtual processors.

Signed-off-by: Lillian Grassin-Drake <[email protected]>
Signed-off-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Nuno Das Neves <[email protected]>
Co-Developed-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Nuno Das Neves <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
---
v6:
1. Address Michael's comments.

v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.

v3:
1. Add __packed to structures.
2. Drop unnecessary exports.

v2:
1. Adapt to hypervisor side changes
2. Address Vitaly's comments

use u64 status

pages

major comments

minor comments

rely on acpi code
---
arch/x86/hyperv/Makefile | 2 +-
arch/x86/hyperv/hv_proc.c | 219 ++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 4 +
include/asm-generic/hyperv-tlfs.h | 67 +++++++++
4 files changed, 291 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/hyperv/hv_proc.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 89b1f74d3225..565358020921 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y := hv_init.o mmu.o nested.o
-obj-$(CONFIG_X86_64) += hv_apic.o
+obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o

ifdef CONFIG_X86_64
obj-$(CONFIG_PARAVIRT_SPINLOCKS) += hv_spinlock.o
diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
new file mode 100644
index 000000000000..60461e598239
--- /dev/null
+++ b/arch/x86/hyperv/hv_proc.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/types.h>
+#include <linux/version.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/clockchips.h>
+#include <linux/acpi.h>
+#include <linux/hyperv.h>
+#include <linux/slab.h>
+#include <linux/cpuhotplug.h>
+#include <linux/minmax.h>
+#include <asm/hypervisor.h>
+#include <asm/mshyperv.h>
+#include <asm/apic.h>
+
+#include <asm/trace/hyperv.h>
+
+/*
+ * See struct hv_deposit_memory. The first u64 is partition ID, the rest
+ * are GPAs.
+ */
+#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
+
+/* Deposits exact number of pages. Must be called with interrupts enabled. */
+int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
+{
+ struct page **pages, *page;
+ int *counts;
+ int num_allocations;
+ int i, j, page_count;
+ int order;
+ u64 status;
+ int ret;
+ u64 base_pfn;
+ struct hv_deposit_memory *input_page;
+ unsigned long flags;
+
+ if (num_pages > HV_DEPOSIT_MAX)
+ return -E2BIG;
+ if (!num_pages)
+ return 0;
+
+ /* One buffer for page pointers and counts */
+ page = alloc_page(GFP_KERNEL);
+ if (!page)
+ return -ENOMEM;
+ pages = page_address(page);
+
+ counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
+ if (!counts) {
+ free_page((unsigned long)pages);
+ return -ENOMEM;
+ }
+
+ /* Allocate all the pages before disabling interrupts */
+ i = 0;
+
+ while (num_pages) {
+ /* Find highest order we can actually allocate */
+ order = 31 - __builtin_clz(num_pages);
+
+ while (1) {
+ pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
+ if (pages[i])
+ break;
+ if (!order) {
+ ret = -ENOMEM;
+ num_allocations = i;
+ goto err_free_allocations;
+ }
+ --order;
+ }
+
+ split_page(pages[i], order);
+ counts[i] = 1 << order;
+ num_pages -= counts[i];
+ i++;
+ }
+ num_allocations = i;
+
+ local_irq_save(flags);
+
+ input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ input_page->partition_id = partition_id;
+
+ /* Populate gpa_page_list - these will fit on the input page */
+ for (i = 0, page_count = 0; i < num_allocations; ++i) {
+ base_pfn = page_to_pfn(pages[i]);
+ for (j = 0; j < counts[i]; ++j, ++page_count)
+ input_page->gpa_page_list[page_count] = base_pfn + j;
+ }
+ status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY,
+ page_count, 0, input_page, NULL);
+ local_irq_restore(flags);
+
+ if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
+ pr_err("Failed to deposit pages: %lld\n", status);
+ ret = status;
+ goto err_free_allocations;
+ }
+
+ ret = 0;
+ goto free_buf;
+
+err_free_allocations:
+ for (i = 0; i < num_allocations; ++i) {
+ base_pfn = page_to_pfn(pages[i]);
+ for (j = 0; j < counts[i]; ++j)
+ __free_page(pfn_to_page(base_pfn + j));
+ }
+
+free_buf:
+ free_page((unsigned long)pages);
+ kfree(counts);
+ return ret;
+}
+
+int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
+{
+ struct hv_add_logical_processor_in *input;
+ struct hv_add_logical_processor_out *output;
+ u64 status;
+ unsigned long flags;
+ int ret = 0;
+ int pxm = node_to_pxm(node);
+
+ /*
+ * When adding a logical processor, the hypervisor may return
+ * HV_STATUS_INSUFFICIENT_MEMORY. When that happens, we deposit more
+ * pages and retry.
+ */
+ do {
+ local_irq_save(flags);
+
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ /* We don't do anything with the output right now */
+ output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+
+ input->lp_index = lp_index;
+ input->apic_id = apic_id;
+ input->flags = 0;
+ input->proximity_domain_info.domain_id = pxm;
+ input->proximity_domain_info.flags.reserved = 0;
+ input->proximity_domain_info.flags.proximity_info_valid = 1;
+ input->proximity_domain_info.flags.proximity_preferred = 1;
+ status = hv_do_hypercall(HVCALL_ADD_LOGICAL_PROCESSOR,
+ input, output);
+ local_irq_restore(flags);
+
+ status &= HV_HYPERCALL_RESULT_MASK;
+
+ if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
+ if (status != HV_STATUS_SUCCESS) {
+ pr_err("%s: cpu %u apic ID %u, %lld\n", __func__,
+ lp_index, apic_id, status);
+ ret = status;
+ }
+ break;
+ }
+ ret = hv_call_deposit_pages(node, hv_current_partition_id, 1);
+ } while (!ret);
+
+ return ret;
+}
+
+int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
+{
+ struct hv_create_vp *input;
+ u64 status;
+ unsigned long irq_flags;
+ int ret = 0;
+ int pxm = node_to_pxm(node);
+
+ /* Root VPs don't seem to need pages deposited */
+ if (partition_id != hv_current_partition_id) {
+ /* The value 90 is empirically determined. It may change. */
+ ret = hv_call_deposit_pages(node, partition_id, 90);
+ if (ret)
+ return ret;
+ }
+
+ do {
+ local_irq_save(irq_flags);
+
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ input->partition_id = partition_id;
+ input->vp_index = vp_index;
+ input->flags = flags;
+ input->subnode_type = HvSubnodeAny;
+ if (node != NUMA_NO_NODE) {
+ input->proximity_domain_info.domain_id = pxm;
+ input->proximity_domain_info.flags.reserved = 0;
+ input->proximity_domain_info.flags.proximity_info_valid = 1;
+ input->proximity_domain_info.flags.proximity_preferred = 1;
+ } else {
+ input->proximity_domain_info.as_uint64 = 0;
+ }
+ status = hv_do_hypercall(HVCALL_CREATE_VP, input, NULL);
+ local_irq_restore(irq_flags);
+
+ status &= HV_HYPERCALL_RESULT_MASK;
+
+ if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
+ if (status != HV_STATUS_SUCCESS) {
+ pr_err("%s: vcpu %u, lp %u, %lld\n", __func__,
+ vp_index, flags, status);
+ ret = status;
+ }
+ break;
+ }
+ ret = hv_call_deposit_pages(node, partition_id, 1);
+
+ } while (!ret);
+
+ return ret;
+}
+
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 67f5d35a73d3..4e590a167160 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -80,6 +80,10 @@ extern void __percpu **hyperv_pcpu_output_arg;

extern u64 hv_current_partition_id;

+int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
+int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
+int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
+
static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
{
u64 input_address = input ? virt_to_phys(input) : 0;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 87b1a79b19eb..69de4e3d89d3 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -142,6 +142,8 @@ struct ms_hyperv_tsc_page {
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
#define HVCALL_GET_PARTITION_ID 0x0046
+#define HVCALL_DEPOSIT_MEMORY 0x0048
+#define HVCALL_CREATE_VP 0x004e
#define HVCALL_GET_VP_REGISTERS 0x0050
#define HVCALL_SET_VP_REGISTERS 0x0051
#define HVCALL_POST_MESSAGE 0x005c
@@ -149,6 +151,7 @@ struct ms_hyperv_tsc_page {
#define HVCALL_POST_DEBUG_DATA 0x0069
#define HVCALL_RETRIEVE_DEBUG_DATA 0x006a
#define HVCALL_RESET_DEBUG_SESSION 0x006b
+#define HVCALL_ADD_LOGICAL_PROCESSOR 0x0076
#define HVCALL_RETARGET_INTERRUPT 0x007e
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
@@ -413,6 +416,70 @@ struct hv_get_partition_id {
u64 partition_id;
} __packed;

+/* HvDepositMemory hypercall */
+struct hv_deposit_memory {
+ u64 partition_id;
+ u64 gpa_page_list[];
+} __packed;
+
+struct hv_proximity_domain_flags {
+ u32 proximity_preferred : 1;
+ u32 reserved : 30;
+ u32 proximity_info_valid : 1;
+} __packed;
+
+/* Not a union in windows but useful for zeroing */
+union hv_proximity_domain_info {
+ struct {
+ u32 domain_id;
+ struct hv_proximity_domain_flags flags;
+ };
+ u64 as_uint64;
+} __packed;
+
+struct hv_lp_startup_status {
+ u64 hv_status;
+ u64 substatus1;
+ u64 substatus2;
+ u64 substatus3;
+ u64 substatus4;
+ u64 substatus5;
+ u64 substatus6;
+} __packed;
+
+/* HvAddLogicalProcessor hypercall */
+struct hv_add_logical_processor_in {
+ u32 lp_index;
+ u32 apic_id;
+ union hv_proximity_domain_info proximity_domain_info;
+ u64 flags;
+} __packed;
+
+struct hv_add_logical_processor_out {
+ struct hv_lp_startup_status startup_status;
+} __packed;
+
+enum HV_SUBNODE_TYPE
+{
+ HvSubnodeAny = 0,
+ HvSubnodeSocket = 1,
+ HvSubnodeAmdNode = 2,
+ HvSubnodeL3 = 3,
+ HvSubnodeCount = 4,
+ HvSubnodeInvalid = -1
+};
+
+/* HvCreateVp hypercall */
+struct hv_create_vp {
+ u64 partition_id;
+ u32 vp_index;
+ u8 padding[3];
+ u8 subnode_type;
+ u64 subnode_id;
+ union hv_proximity_domain_info proximity_domain_info;
+ u64 flags;
+} __packed;
+
/* HvRetargetDeviceInterrupt hypercall */
union hv_msi_entry {
u64 as_uint64;
--
2.20.1

2021-02-03 15:17:14

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 05/16] x86/hyperv: allocate output arg pages if required

When Linux runs as the root partition, it will need to make hypercalls
which return data from the hypervisor.

Allocate pages for storing results when Linux runs as the root
partition.

Signed-off-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Lillian Grassin-Drake <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
---
v3: Fix hv_cpu_die to use free_pages.
v2: Address Vitaly's comments
---
arch/x86/hyperv/hv_init.c | 35 ++++++++++++++++++++++++++++-----
arch/x86/include/asm/mshyperv.h | 1 +
2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e04d90af4c27..6f4cb40e53fe 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
void __percpu **hyperv_pcpu_input_arg;
EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);

+void __percpu **hyperv_pcpu_output_arg;
+EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
+
u32 hv_max_vp_index;
EXPORT_SYMBOL_GPL(hv_max_vp_index);

@@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
void **input_arg;
struct page *pg;

- input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
- pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
+ pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
if (unlikely(!pg))
return -ENOMEM;
+
+ input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
*input_arg = page_address(pg);
+ if (hv_root_partition) {
+ void **output_arg;
+
+ output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+ *output_arg = page_address(pg + 1);
+ }

hv_get_vp_index(msr_vp_index);

@@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
unsigned int new_cpu;
unsigned long flags;
void **input_arg;
- void *input_pg = NULL;
+ void *pg;

local_irq_save(flags);
input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
- input_pg = *input_arg;
+ pg = *input_arg;
*input_arg = NULL;
+
+ if (hv_root_partition) {
+ void **output_arg;
+
+ output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+ *output_arg = NULL;
+ }
+
local_irq_restore(flags);
- free_page((unsigned long)input_pg);
+
+ free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);

if (hv_vp_assist_page && hv_vp_assist_page[cpu])
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
@@ -346,6 +365,12 @@ void __init hyperv_init(void)

BUG_ON(hyperv_pcpu_input_arg == NULL);

+ /* Allocate the per-CPU state for output arg for root */
+ if (hv_root_partition) {
+ hyperv_pcpu_output_arg = alloc_percpu(void *);
+ BUG_ON(hyperv_pcpu_output_arg == NULL);
+ }
+
/* Allocate percpu VP index */
hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
GFP_KERNEL);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ac2b0d110f03..62d9390f1ddf 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
#if IS_ENABLED(CONFIG_HYPERV)
extern void *hv_hypercall_pg;
extern void __percpu **hyperv_pcpu_input_arg;
+extern void __percpu **hyperv_pcpu_output_arg;

static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
{
--
2.20.1

2021-02-03 15:18:26

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus

Microsoft Hypervisor requires the root partition to make a few
hypercalls to setup application processors before they can be used.

Signed-off-by: Lillian Grassin-Drake <[email protected]>
Signed-off-by: Sunil Muthuswamy <[email protected]>
Co-Developed-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Sunil Muthuswamy <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
CPU hotplug and unplug is not yet supported in this setup, so those
paths remain untouched.

v3: Always call native SMP preparation function.
---
arch/x86/kernel/cpu/mshyperv.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index c376d191a260..13d3b6dd21a3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -31,6 +31,7 @@
#include <asm/reboot.h>
#include <asm/nmi.h>
#include <clocksource/hyperv_timer.h>
+#include <asm/numa.h>

/* Is Linux running as the root partition? */
bool hv_root_partition;
@@ -212,6 +213,32 @@ static void __init hv_smp_prepare_boot_cpu(void)
hv_init_spinlocks();
#endif
}
+
+static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
+{
+#ifdef CONFIG_X86_64
+ int i;
+ int ret;
+#endif
+
+ native_smp_prepare_cpus(max_cpus);
+
+#ifdef CONFIG_X86_64
+ for_each_present_cpu(i) {
+ if (i == 0)
+ continue;
+ ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
+ BUG_ON(ret);
+ }
+
+ for_each_present_cpu(i) {
+ if (i == 0)
+ continue;
+ ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i);
+ BUG_ON(ret);
+ }
+#endif
+}
#endif

static void __init ms_hyperv_init_platform(void)
@@ -368,6 +395,8 @@ static void __init ms_hyperv_init_platform(void)

# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
+ if (hv_root_partition)
+ smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
# endif

/*
--
2.20.1

2021-02-03 15:19:39

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

There is already a stub function for pxm_to_node but conversion to the
other direction is missing.

It will be used by Microsoft Hypervisor code later.

Signed-off-by: Wei Liu <[email protected]>
---
v6: new
---
include/acpi/acpi_numa.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
index a4c6ef809e27..40a91ce87e04 100644
--- a/include/acpi/acpi_numa.h
+++ b/include/acpi/acpi_numa.h
@@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
{
return 0;
}
+static inline int node_to_pxm(int node)
+{
+ return 0;
+}
#endif /* CONFIG_ACPI_NUMA */

#ifdef CONFIG_ACPI_HMAT
--
2.20.1

2021-02-03 15:20:16

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root

There is no VMBus and the other infrastructures initialized in
hv_acpi_init when Linux is running as the root partition.

Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
v3: Return 0 instead of -ENODEV.
---
drivers/hv/vmbus_drv.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 502f8cd95f6d..ee27b3670a51 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void)
if (!hv_is_hyperv_initialized())
return -ENODEV;

+ if (hv_root_partition)
+ return 0;
+
init_completion(&probe_event);

/*
--
2.20.1

2021-02-03 15:20:23

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 04/16] clocksource/hyperv: use MSR-based access if running as root

When Linux runs as the root partition, the setup required for TSC page
is different. Luckily Linux also has access to the MSR based
clocksource. We can just disable the TSC page clocksource if Linux is
the root partition.

Signed-off-by: Wei Liu <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
drivers/clocksource/hyperv_timer.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba04cb381cd3..269a691bd2c4 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
return false;

+ if (hv_root_partition)
+ return false;
+
hv_read_reference_counter = read_hv_clock_tsc;
phys_addr = virt_to_phys(hv_get_tsc_page());

--
2.20.1

2021-02-03 15:20:30

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT

This makes the name match Hyper-V TLFS.

Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
include/asm-generic/hyperv-tlfs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index e73a11850055..e6903589a82a 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -88,7 +88,7 @@
#define HV_CONNECT_PORT BIT(7)
#define HV_ACCESS_STATS BIT(8)
#define HV_DEBUGGING BIT(11)
-#define HV_CPU_POWER_MANAGEMENT BIT(12)
+#define HV_CPU_MANAGEMENT BIT(12)


/*
--
2.20.1

2021-02-03 15:20:38

by Wei Liu

[permalink] [raw]
Subject: [PATCH v6 02/16] x86/hyperv: detect if Linux is the root partition

For now we can use the privilege flag to check. Stash the value to be
used later.

Put in a bunch of defines for future use when we want to have more
fine-grained detection.

Signed-off-by: Wei Liu <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
---
v3: move hv_root_partition to mshyperv.c
---
arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
arch/x86/include/asm/mshyperv.h | 2 ++
arch/x86/kernel/cpu/mshyperv.c | 20 ++++++++++++++++++++
3 files changed, 32 insertions(+)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 6bf42aed387e..204010350604 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -21,6 +21,7 @@
#define HYPERV_CPUID_FEATURES 0x40000003
#define HYPERV_CPUID_ENLIGHTMENT_INFO 0x40000004
#define HYPERV_CPUID_IMPLEMENT_LIMITS 0x40000005
+#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES 0x40000007
#define HYPERV_CPUID_NESTED_FEATURES 0x4000000A

#define HYPERV_CPUID_VIRT_STACK_INTERFACE 0x40000081
@@ -110,6 +111,15 @@
/* Recommend using enlightened VMCS */
#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)

+/*
+ * CPU management features identification.
+ * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
+ */
+#define HV_X64_START_LOGICAL_PROCESSOR BIT(0)
+#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR BIT(1)
+#define HV_X64_PERFORMANCE_COUNTER_SYNC BIT(2)
+#define HV_X64_RESERVED_IDENTITY_BIT BIT(31)
+
/*
* Virtual processor will never share a physical core with another virtual
* processor, except for virtual processors that are reported as sibling SMT
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..ac2b0d110f03 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
struct hv_guest_mapping_flush_list *flush,
u64 start_gfn, u64 end_gfn);

+extern bool hv_root_partition;
+
#ifdef CONFIG_X86_64
void hv_apic_init(void);
void __init hv_init_spinlocks(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f628e3dc150f..c376d191a260 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,6 +32,10 @@
#include <asm/nmi.h>
#include <clocksource/hyperv_timer.h>

+/* Is Linux running as the root partition? */
+bool hv_root_partition;
+EXPORT_SYMBOL_GPL(hv_root_partition);
+
struct ms_hyperv_info ms_hyperv;
EXPORT_SYMBOL_GPL(ms_hyperv);

@@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);

+ /*
+ * Check CPU management privilege.
+ *
+ * To mirror what Windows does we should extract CPU management
+ * features and use the ReservedIdentityBit to detect if Linux is the
+ * root partition. But that requires negotiating CPU management
+ * interface (a process to be finalized).
+ *
+ * For now, use the privilege flag as the indicator for running as
+ * root.
+ */
+ if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
+ hv_root_partition = true;
+ pr_info("Hyper-V: running as root partition\n");
+ }
+
/*
* Extract host information.
*/
--
2.20.1

2021-02-04 13:36:14

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

On Wed, Feb 03, 2021 at 03:04:35PM +0000, Wei Liu wrote:
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
>
> Signed-off-by: Wei Liu <[email protected]>

Acked-by: Joerg Roedel <[email protected]>

> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
> arch/x86/hyperv/irqdomain.c | 25 +++++
> arch/x86/include/asm/mshyperv.h | 4 +
> drivers/iommu/hyperv-iommu.c | 177 +++++++++++++++++++++++++++++++-
> 3 files changed, 203 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 117f17e8c88a..0cabc9aece38 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -360,3 +360,28 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
> }
>
> #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_unmap_interrupt(device_id.as_uint64, entry);
> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_map_interrupt(device_id, level, cpu, vector, entry);
> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
>
> struct irq_domain *hv_create_pci_msi_domain(void);
>
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> +
> #else /* CONFIG_HYPERV */
> static inline void hyperv_init(void) {}
> static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..e285a220c913 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
> #include <asm/io_apic.h>
> #include <asm/irq_remapping.h>
> #include <asm/hypervisor.h>
> +#include <asm/mshyperv.h>
>
> #include "irq_remapping.h"
>
> @@ -115,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = {
> .free = hyperv_irq_remapping_free,
> };
>
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
> static int __init hyperv_prepare_irq_remapping(void)
> {
> struct fwnode_handle *fn;
> int i;
> + const char *name;
> + const struct irq_domain_ops *ops;
>
> if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> x86_init.hyper.msi_ext_dest_id() ||
> !x2apic_supported())
> return -ENODEV;
>
> - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> + if (hv_root_partition) {
> + name = "HYPERV-ROOT-IR";
> + ops = &hyperv_root_ir_domain_ops;
> + } else {
> + name = "HYPERV-IR";
> + ops = &hyperv_ir_domain_ops;
> + }
> +
> + fn = irq_domain_alloc_named_id_fwnode(name, 0);
> if (!fn)
> return -ENOMEM;
>
> ioapic_ir_domain =
> irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
> - 0, IOAPIC_REMAPPING_ENTRY, fn,
> - &hyperv_ir_domain_ops, NULL);
> + 0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
>
> if (!ioapic_ir_domain) {
> irq_domain_free_fwnode(fn);
> return -ENOMEM;
> }
>
> + if (hv_root_partition)
> + return 0; /* The rest is only relevant to guests */
> +
> /*
> * Hyper-V doesn't provide irq remapping function for
> * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> @@ -166,4 +180,161 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
> .enable = hyperv_enable_irq_remapping,
> };
>
> +/* IRQ remapping domain when Linux runs as the root partition */
> +struct hyperv_root_ir_data {
> + u8 ioapic_id;
> + bool is_level;
> + struct hv_interrupt_entry entry;
> +};
> +
> +static void
> +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
> +{
> + u64 status;
> + u32 vector;
> + struct irq_cfg *cfg;
> + int ioapic_id;
> + struct cpumask *affinity;
> + int cpu;
> + struct hv_interrupt_entry entry;
> + struct hyperv_root_ir_data *data = irq_data->chip_data;
> + struct IO_APIC_route_entry e;
> +
> + cfg = irqd_cfg(irq_data);
> + affinity = irq_data_get_effective_affinity_mask(irq_data);
> + cpu = cpumask_first_and(affinity, cpu_online_mask);
> +
> + vector = cfg->vector;
> + ioapic_id = data->ioapic_id;
> +
> + if (data->entry.source == HV_DEVICE_TYPE_IOAPIC
> + && data->entry.ioapic_rte.as_uint64) {
> + entry = data->entry;
> +
> + status = hv_unmap_ioapic_interrupt(ioapic_id, &entry);
> +
> + if (status != HV_STATUS_SUCCESS)
> + pr_debug("%s: unexpected unmap status %lld\n", __func__, status);
> +
> + data->entry.ioapic_rte.as_uint64 = 0;
> + data->entry.source = 0; /* Invalid source */
> + }
> +
> +
> + status = hv_map_ioapic_interrupt(ioapic_id, data->is_level, cpu,
> + vector, &entry);
> +
> + if (status != HV_STATUS_SUCCESS) {
> + pr_err("%s: map hypercall failed, status %lld\n", __func__, status);
> + return;
> + }
> +
> + data->entry = entry;
> +
> + /* Turn it into an IO_APIC_route_entry, and generate MSI MSG. */
> + e.w1 = entry.ioapic_rte.low_uint32;
> + e.w2 = entry.ioapic_rte.high_uint32;
> +
> + memset(msg, 0, sizeof(*msg));
> + msg->arch_data.vector = e.vector;
> + msg->arch_data.delivery_mode = e.delivery_mode;
> + msg->arch_addr_lo.dest_mode_logical = e.dest_mode_logical;
> + msg->arch_addr_lo.dmar_format = e.ir_format;
> + msg->arch_addr_lo.dmar_index_0_14 = e.ir_index_0_14;
> +}
> +
> +static int hyperv_root_ir_set_affinity(struct irq_data *data,
> + const struct cpumask *mask, bool force)
> +{
> + struct irq_data *parent = data->parent_data;
> + struct irq_cfg *cfg = irqd_cfg(data);
> + int ret;
> +
> + ret = parent->chip->irq_set_affinity(parent, mask, force);
> + if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> + return ret;
> +
> + send_cleanup_vector(cfg);
> +
> + return 0;
> +}
> +
> +static struct irq_chip hyperv_root_ir_chip = {
> + .name = "HYPERV-ROOT-IR",
> + .irq_ack = apic_ack_irq,
> + .irq_set_affinity = hyperv_root_ir_set_affinity,
> + .irq_compose_msi_msg = hyperv_root_ir_compose_msi_msg,
> +};
> +
> +static int hyperv_root_irq_remapping_alloc(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs,
> + void *arg)
> +{
> + struct irq_alloc_info *info = arg;
> + struct irq_data *irq_data;
> + struct hyperv_root_ir_data *data;
> + int ret = 0;
> +
> + if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
> + return -EINVAL;
> +
> + ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> + if (ret < 0)
> + return ret;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + if (!data) {
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> + return -ENOMEM;
> + }
> +
> + irq_data = irq_domain_get_irq_data(domain, virq);
> + if (!irq_data) {
> + kfree(data);
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> + return -EINVAL;
> + }
> +
> + data->ioapic_id = info->devid;
> + data->is_level = info->ioapic.is_level;
> +
> + irq_data->chip = &hyperv_root_ir_chip;
> + irq_data->chip_data = data;
> +
> + return 0;
> +}
> +
> +static void hyperv_root_irq_remapping_free(struct irq_domain *domain,
> + unsigned int virq, unsigned int nr_irqs)
> +{
> + struct irq_data *irq_data;
> + struct hyperv_root_ir_data *data;
> + struct hv_interrupt_entry *e;
> + int i;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_data = irq_domain_get_irq_data(domain, virq + i);
> +
> + if (irq_data && irq_data->chip_data) {
> + data = irq_data->chip_data;
> + e = &data->entry;
> +
> + if (e->source == HV_DEVICE_TYPE_IOAPIC
> + && e->ioapic_rte.as_uint64)
> + hv_unmap_ioapic_interrupt(data->ioapic_id,
> + &data->entry);
> +
> + kfree(data);
> + }
> + }
> +
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +}
> +
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
> + .select = hyperv_irq_remapping_select,
> + .alloc = hyperv_root_irq_remapping_alloc,
> + .free = hyperv_root_irq_remapping_free,
> +};
> +
> #endif
> --
> 2.20.1

2021-02-04 16:58:41

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 06/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary

From: Wei Liu <[email protected]> Sent: Wednesday, February 3, 2021 7:04 AM
>
> We will need the partition ID for executing some hypercalls later.
>
> Signed-off-by: Lillian Grassin-Drake <[email protected]>
> Co-Developed-by: Sunil Muthuswamy <[email protected]>
> Signed-off-by: Wei Liu <[email protected]>
> ---
> v6:
> 1. Use u64 status.
>
> v3:
> 1. Make hv_get_partition_id static.
> 2. Change code structure a bit.
> ---
> arch/x86/hyperv/hv_init.c | 26 ++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 2 ++
> include/asm-generic/hyperv-tlfs.h | 6 ++++++
> 3 files changed, 34 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6f4cb40e53fe..5b90a7290177 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -26,6 +26,9 @@
> #include <linux/syscore_ops.h>
> #include <clocksource/hyperv_timer.h>
>
> +u64 hv_current_partition_id = ~0ull;
> +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> +
> void *hv_hypercall_pg;
> EXPORT_SYMBOL_GPL(hv_hypercall_pg);
>
> @@ -331,6 +334,24 @@ static struct syscore_ops hv_syscore_ops = {
> .resume = hv_resume,
> };
>
> +static void __init hv_get_partition_id(void)
> +{
> + struct hv_get_partition_id *output_page;
> + u64 status;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page);
> + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
> + /* No point in proceeding if this failed */
> + pr_err("Failed to get partition ID: %lld\n", status);
> + BUG();
> + }
> + hv_current_partition_id = output_page->partition_id;
> + local_irq_restore(flags);
> +}
> +
> /*
> * This function is to be invoked early in the boot sequence after the
> * hypervisor has been detected.
> @@ -426,6 +447,11 @@ void __init hyperv_init(void)
>
> register_syscore_ops(&hv_syscore_ops);
>
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> + hv_get_partition_id();
> +
> + BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> +
> return;
>
> remove_cpuhp_state:
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 62d9390f1ddf..67f5d35a73d3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -78,6 +78,8 @@ extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
> extern void __percpu **hyperv_pcpu_output_arg;
>
> +extern u64 hv_current_partition_id;
> +
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index e6903589a82a..87b1a79b19eb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> +#define HVCALL_GET_PARTITION_ID 0x0046
> #define HVCALL_GET_VP_REGISTERS 0x0050
> #define HVCALL_SET_VP_REGISTERS 0x0051
> #define HVCALL_POST_MESSAGE 0x005c
> @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
> u64 gva_list[];
> } __packed;
>
> +/* HvGetPartitionId hypercall (output only) */
> +struct hv_get_partition_id {
> + u64 partition_id;
> +} __packed;
> +
> /* HvRetargetDeviceInterrupt hypercall */
> union hv_msi_entry {
> u64 as_uint64;
> --
> 2.20.1

Reviewed-by: Michael Kelley <[email protected]>

2021-02-04 17:01:09

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 05/16] x86/hyperv: allocate output arg pages if required

From: Wei Liu <[email protected]> Sent: Wednesday, February 3, 2021 7:04 AM
>
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
>
> Allocate pages for storing results when Linux runs as the root
> partition.
>
> Signed-off-by: Lillian Grassin-Drake <[email protected]>
> Co-Developed-by: Lillian Grassin-Drake <[email protected]>
> Signed-off-by: Wei Liu <[email protected]>
> ---
> v3: Fix hv_cpu_die to use free_pages.
> v2: Address Vitaly's comments
> ---
> arch/x86/hyperv/hv_init.c | 35 ++++++++++++++++++++++++++++-----
> arch/x86/include/asm/mshyperv.h | 1 +
> 2 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e04d90af4c27..6f4cb40e53fe 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> void __percpu **hyperv_pcpu_input_arg;
> EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>
> +void __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
> u32 hv_max_vp_index;
> EXPORT_SYMBOL_GPL(hv_max_vp_index);
>
> @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
> void **input_arg;
> struct page *pg;
>
> - input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> + pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
> if (unlikely(!pg))
> return -ENOMEM;
> +
> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> *input_arg = page_address(pg);
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = page_address(pg + 1);
> + }
>
> hv_get_vp_index(msr_vp_index);
>
> @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
> unsigned int new_cpu;
> unsigned long flags;
> void **input_arg;
> - void *input_pg = NULL;
> + void *pg;
>
> local_irq_save(flags);
> input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> - input_pg = *input_arg;
> + pg = *input_arg;
> *input_arg = NULL;
> +
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = NULL;
> + }
> +
> local_irq_restore(flags);
> - free_page((unsigned long)input_pg);
> +
> + free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
>
> if (hv_vp_assist_page && hv_vp_assist_page[cpu])
> wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -346,6 +365,12 @@ void __init hyperv_init(void)
>
> BUG_ON(hyperv_pcpu_input_arg == NULL);
>
> + /* Allocate the per-CPU state for output arg for root */
> + if (hv_root_partition) {
> + hyperv_pcpu_output_arg = alloc_percpu(void *);
> + BUG_ON(hyperv_pcpu_output_arg == NULL);
> + }
> +
> /* Allocate percpu VP index */
> hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
> #if IS_ENABLED(CONFIG_HYPERV)
> extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
> +extern void __percpu **hyperv_pcpu_output_arg;
>
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> --
> 2.20.1

Reviewed-by: Michael Kelley <[email protected]>

2021-02-04 17:20:36

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures

From: Wei Liu <[email protected]> Sent: Wednesday, February 3, 2021 7:05 AM
>
> We will need to identify the device we want Microsoft Hypervisor to
> manipulate. Introduce the data structures for that purpose.
>
> They will be used in a later patch.
>
> Signed-off-by: Sunil Muthuswamy <[email protected]>
> Co-Developed-by: Sunil Muthuswamy <[email protected]>
> Signed-off-by: Wei Liu <[email protected]>
> ---
> v6:
> 1. Add reserved0 as field name.
> ---
> include/asm-generic/hyperv-tlfs.h | 79 +++++++++++++++++++++++++++++++
> 1 file changed, 79 insertions(+)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 94c7d77bbf68..ce53c0db28ae 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
> } element[];
> } __packed;
>
> +enum hv_device_type {
> + HV_DEVICE_TYPE_LOGICAL = 0,
> + HV_DEVICE_TYPE_PCI = 1,
> + HV_DEVICE_TYPE_IOAPIC = 2,
> + HV_DEVICE_TYPE_ACPI = 3,
> +};
> +
> +typedef u16 hv_pci_rid;
> +typedef u16 hv_pci_segment;
> +typedef u64 hv_logical_device_id;
> +union hv_pci_bdf {
> + u16 as_uint16;
> +
> + struct {
> + u8 function:3;
> + u8 device:5;
> + u8 bus;
> + };
> +} __packed;
> +
> +union hv_pci_bus_range {
> + u16 as_uint16;
> +
> + struct {
> + u8 subordinate_bus;
> + u8 secondary_bus;
> + };
> +} __packed;
> +
> +union hv_device_id {
> + u64 as_uint64;
> +
> + struct {
> + u64 reserved0:62;
> + u64 device_type:2;
> + };
> +
> + /* HV_DEVICE_TYPE_LOGICAL */
> + struct {
> + u64 id:62;
> + u64 device_type:2;
> + } logical;
> +
> + /* HV_DEVICE_TYPE_PCI */
> + struct {
> + union {
> + hv_pci_rid rid;
> + union hv_pci_bdf bdf;
> + };
> +
> + hv_pci_segment segment;
> + union hv_pci_bus_range shadow_bus_range;
> +
> + u16 phantom_function_bits:2;
> + u16 source_shadow:1;
> +
> + u16 rsvdz0:11;
> + u16 device_type:2;
> + } pci;
> +
> + /* HV_DEVICE_TYPE_IOAPIC */
> + struct {
> + u8 ioapic_id;
> + u8 rsvdz0;
> + u16 rsvdz1;
> + u16 rsvdz2;
> +
> + u16 rsvdz3:14;
> + u16 device_type:2;
> + } ioapic;
> +
> + /* HV_DEVICE_TYPE_ACPI */
> + struct {
> + u32 input_mapping_base;
> + u32 input_mapping_count:30;
> + u32 device_type:2;
> + } acpi;
> +} __packed;
> +
> #endif
> --
> 2.20.1

Reviewed-by: Michael Kelley <[email protected]>

2021-02-04 17:21:22

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions

From: Wei Liu <[email protected]> Sent: Wednesday, February 3, 2021 7:04 AM
>
> They are used to deposit pages into Microsoft Hypervisor and bring up
> logical and virtual processors.
>
> Signed-off-by: Lillian Grassin-Drake <[email protected]>
> Signed-off-by: Sunil Muthuswamy <[email protected]>
> Signed-off-by: Nuno Das Neves <[email protected]>
> Co-Developed-by: Lillian Grassin-Drake <[email protected]>
> Co-Developed-by: Sunil Muthuswamy <[email protected]>
> Co-Developed-by: Nuno Das Neves <[email protected]>
> Signed-off-by: Wei Liu <[email protected]>
> ---
> v6:
> 1. Address Michael's comments.
>
> v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.
>
> v3:
> 1. Add __packed to structures.
> 2. Drop unnecessary exports.
>
> v2:
> 1. Adapt to hypervisor side changes
> 2. Address Vitaly's comments
>
> use u64 status
>
> pages
>
> major comments
>
> minor comments
>
> rely on acpi code
> ---
> arch/x86/hyperv/Makefile | 2 +-
> arch/x86/hyperv/hv_proc.c | 219 ++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 4 +
> include/asm-generic/hyperv-tlfs.h | 67 +++++++++
> 4 files changed, 291 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/hyperv/hv_proc.c
>

Reviewed-by: Michael Kelley <[email protected]>

2021-02-04 17:46:34

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

From: Wei Liu <[email protected]> Sent: Wednesday, February 3, 2021 7:05 AM
>
> When Linux runs as the root partition on Microsoft Hypervisor, its
> interrupts are remapped. Linux will need to explicitly map and unmap
> interrupts for hardware.
>
> Implement an MSI domain to issue the correct hypercalls. And initialize
> this irqdomain as the default MSI irq domain.
>
> Signed-off-by: Sunil Muthuswamy <[email protected]>
> Co-Developed-by: Sunil Muthuswamy <[email protected]>
> Signed-off-by: Wei Liu <[email protected]>
> ---
> v6:
> 1. Use u64 status.
> 2. Use vpset instead of bitmap.
> 3. Factor out hv_map_interrupt
> 4. Address other misc comments.
>
> v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> v3: build irqdomain.o for 32bit as well.
> v2: This patch is simplified due to upstream changes.
> ---
> arch/x86/hyperv/Makefile | 2 +-
> arch/x86/hyperv/hv_init.c | 9 +
> arch/x86/hyperv/irqdomain.c | 362 ++++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 2 +
> 4 files changed, 374 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/hyperv/irqdomain.c
>
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 565358020921..48e2c51464e8 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -obj-y := hv_init.o mmu.o nested.o
> +obj-y := hv_init.o mmu.o nested.o irqdomain.o
> obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
>
> ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 11c5997691f4..894ce899f0cb 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -483,6 +483,15 @@ void __init hyperv_init(void)
>
> BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
>
> +#ifdef CONFIG_PCI_MSI
> + /*
> + * If we're running as root, we want to create our own PCI MSI domain.
> + * We can't set this in hv_pci_init because that would be too late.
> + */
> + if (hv_root_partition)
> + x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
> +#endif
> +
> return;
>
> remove_cpuhp_state:
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> new file mode 100644
> index 000000000000..117f17e8c88a
> --- /dev/null
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -0,0 +1,362 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * for Linux to run as the root partition on Microsoft Hypervisor.

Nit: Looks like the initial word "Irqdomain" got dropped from the above
comment line. But don't respin just for this.

> + *
> + * Authors:
> + * Sunil Muthuswamy <[email protected]>
> + * Wei Liu <[email protected]>
> + */
> +
> +#include <linux/pci.h>
> +#include <linux/irq.h>
> +#include <asm/mshyperv.h>
> +
> +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> + int cpu, int vector, struct hv_interrupt_entry *entry)
> +{
> + struct hv_input_map_device_interrupt *input;
> + struct hv_output_map_device_interrupt *output;
> + struct hv_device_interrupt_descriptor *intr_desc;
> + unsigned long flags;
> + u64 status;
> + cpumask_t mask = CPU_MASK_NONE;
> + int nr_bank, var_size;
> +
> + local_irq_save(flags);
> +
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> + intr_desc = &input->interrupt_descriptor;
> + memset(input, 0, sizeof(*input));
> + input->partition_id = hv_current_partition_id;
> + input->device_id = device_id.as_uint64;
> + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> + intr_desc->vector_count = 1;
> + intr_desc->target.vector = vector;
> +
> + if (level)
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> + else
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> + cpumask_set_cpu(cpu, &mask);
> + intr_desc->target.vp_set.valid_bank_mask = 0;
> + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);

There's a function get_cpu_mask() that returns a pointer to a cpumask with only
the specified cpu set in the mask. It returns a const pointer to the correct entry
in a pre-allocated array of all such cpumasks, so it's a lot more efficient than
allocating and initializing a local cpumask instance on the stack.

> + if (nr_bank < 0) {
> + local_irq_restore(flags);
> + pr_err("%s: unable to generate VP set\n", __func__);
> + return EINVAL;
> + }
> + intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
> +
> + /*
> + * var-sized hypercall, var-size starts after vp_mask (thus
> + * vp_set.format does not count, but vp_set.valid_bank_mask
> + * does).
> + */
> + var_size = nr_bank + 1;
> +
> + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, var_size,
> + input, output);
> + *entry = output->interrupt_entry;
> +
> + local_irq_restore(flags);
> +
> + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS)
> + pr_err("%s: hypercall failed, status %lld\n", __func__, status);
> +
> + return status & HV_HYPERCALL_RESULT_MASK;
> +}
> +
> +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> +{
> + unsigned long flags;
> + struct hv_input_unmap_device_interrupt *input;
> + struct hv_interrupt_entry *intr_entry;
> + u64 status;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + memset(input, 0, sizeof(*input));
> + intr_entry = &input->interrupt_entry;
> + input->partition_id = hv_current_partition_id;
> + input->device_id = id;
> + *intr_entry = *old_entry;
> +
> + status = hv_do_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, input, NULL);
> + local_irq_restore(flags);
> +
> + return status & HV_HYPERCALL_RESULT_MASK;
> +}
> +
> +#ifdef CONFIG_PCI_MSI
> +struct rid_data {
> + struct pci_dev *bridge;
> + u32 rid;
> +};
> +
> +static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
> +{
> + struct rid_data *rd = data;
> + u8 bus = PCI_BUS_NUM(rd->rid);
> +
> + if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) {
> + rd->bridge = pdev;
> + rd->rid = alias;
> + }
> +
> + return 0;
> +}
> +
> +static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> +{
> + union hv_device_id dev_id;
> + struct rid_data data = {
> + .bridge = NULL,
> + .rid = PCI_DEVID(dev->bus->number, dev->devfn)
> + };
> +
> + pci_for_each_dma_alias(dev, get_rid_cb, &data);
> +
> + dev_id.as_uint64 = 0;
> + dev_id.device_type = HV_DEVICE_TYPE_PCI;
> + dev_id.pci.segment = pci_domain_nr(dev->bus);
> +
> + dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
> + dev_id.pci.bdf.device = PCI_SLOT(data.rid);
> + dev_id.pci.bdf.function = PCI_FUNC(data.rid);
> + dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
> +
> + if (data.bridge) {
> + int pos;
> +
> + /*
> + * Microsoft Hypervisor requires a bus range when the bridge is
> + * running in PCI-X mode.
> + *
> + * To distinguish conventional vs PCI-X bridge, we can check
> + * the bridge's PCI-X Secondary Status Register, Secondary Bus
> + * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge
> + * Specification Revision 1.0 5.2.2.1.3.
> + *
> + * Value zero means it is in conventional mode, otherwise it is
> + * in PCI-X mode.
> + */
> +
> + pos = pci_find_capability(data.bridge, PCI_CAP_ID_PCIX);
> + if (pos) {
> + u16 status;
> +
> + pci_read_config_word(data.bridge, pos +
> + PCI_X_BRIDGE_SSTATUS, &status);
> +
> + if (status & PCI_X_SSTATUS_FREQ) {
> + /* Non-zero, PCI-X mode */
> + u8 sec_bus, sub_bus;
> +
> + dev_id.pci.source_shadow = HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE;
> +
> + pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, &sec_bus);
> + dev_id.pci.shadow_bus_range.secondary_bus = sec_bus;
> + pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, &sub_bus);
> + dev_id.pci.shadow_bus_range.subordinate_bus = sub_bus;
> + }
> + }
> + }
> +
> + return dev_id;
> +}
> +
> +static int hv_map_msi_interrupt(struct pci_dev *dev, int cpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + union hv_device_id device_id = hv_build_pci_dev_id(dev);
> +
> + return hv_map_interrupt(device_id, false, cpu, vector, entry);
> +}
> +
> +static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi_msg *msg)
> +{
> + /* High address is always 0 */
> + msg->address_hi = 0;
> + msg->address_lo = entry->msi_entry.address.as_uint32;
> + msg->data = entry->msi_entry.data.as_uint32;
> +}
> +
> +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry);
> +static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> +{
> + struct msi_desc *msidesc;
> + struct pci_dev *dev;
> + struct hv_interrupt_entry out_entry, *stored_entry;
> + struct irq_cfg *cfg = irqd_cfg(data);
> + cpumask_t *affinity;
> + int cpu;
> + u64 status;
> +
> + msidesc = irq_data_get_msi_desc(data);
> + dev = msi_desc_to_pci_dev(msidesc);
> +
> + if (!cfg) {
> + pr_debug("%s: cfg is NULL", __func__);
> + return;
> + }
> +
> + affinity = irq_data_get_effective_affinity_mask(data);
> + cpu = cpumask_first_and(affinity, cpu_online_mask);
> +
> + if (data->chip_data) {
> + /*
> + * This interrupt is already mapped. Let's unmap first.
> + *
> + * We don't use retarget interrupt hypercalls here because
> + * Microsoft Hypervisor doens't allow root to change the vector
> + * or specify VPs outside of the set that is initially used
> + * during mapping.
> + */
> + stored_entry = data->chip_data;
> + data->chip_data = NULL;
> +
> + status = hv_unmap_msi_interrupt(dev, stored_entry);
> +
> + kfree(stored_entry);
> +
> + if (status != HV_STATUS_SUCCESS) {
> + pr_debug("%s: failed to unmap, status %lld", __func__, status);
> + return;
> + }
> + }
> +
> + stored_entry = kzalloc(sizeof(*stored_entry), GFP_ATOMIC);
> + if (!stored_entry) {
> + pr_debug("%s: failed to allocate chip data\n", __func__);
> + return;
> + }
> +
> + status = hv_map_msi_interrupt(dev, cpu, cfg->vector, &out_entry);
> + if (status != HV_STATUS_SUCCESS) {
> + kfree(stored_entry);
> + return;
> + }
> +
> + *stored_entry = out_entry;
> + data->chip_data = stored_entry;
> + entry_to_msi_msg(&out_entry, msg);
> +
> + return;
> +}
> +
> +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry)
> +{
> + return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry);
> +}
> +
> +static void hv_teardown_msi_irq_common(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
> +{
> + u64 status;
> + struct hv_interrupt_entry old_entry;
> + struct irq_desc *desc;
> + struct irq_data *data;
> + struct msi_msg msg;
> +
> + desc = irq_to_desc(irq);
> + if (!desc) {
> + pr_debug("%s: no irq desc\n", __func__);
> + return;
> + }
> +
> + data = &desc->irq_data;
> + if (!data) {
> + pr_debug("%s: no irq data\n", __func__);
> + return;
> + }
> +
> + if (!data->chip_data) {
> + pr_debug("%s: no chip data\n!", __func__);
> + return;
> + }
> +
> + old_entry = *(struct hv_interrupt_entry *)data->chip_data;
> + entry_to_msi_msg(&old_entry, &msg);
> +
> + kfree(data->chip_data);
> + data->chip_data = NULL;
> +
> + status = hv_unmap_msi_interrupt(dev, &old_entry);
> +
> + if (status != HV_STATUS_SUCCESS) {
> + pr_err("%s: hypercall failed, status %lld\n", __func__, status);
> + return;
> + }
> +}
> +
> +static void hv_msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> +{
> + int i;
> + struct msi_desc *entry;
> + struct pci_dev *pdev;
> +
> + if (WARN_ON_ONCE(!dev_is_pci(dev)))
> + return;
> +
> + pdev = to_pci_dev(dev);
> +
> + for_each_pci_msi_entry(entry, pdev) {
> + if (entry->irq) {
> + for (i = 0; i < entry->nvec_used; i++) {
> + hv_teardown_msi_irq_common(pdev, entry, entry->irq + i);
> + irq_domain_free_irqs(entry->irq + i, 1);
> + }
> + }
> + }
> +}
> +
> +/*
> + * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
> + * which implement the MSI or MSI-X Capability Structure.
> + */
> +static struct irq_chip hv_pci_msi_controller = {
> + .name = "HV-PCI-MSI",
> + .irq_unmask = pci_msi_unmask_irq,
> + .irq_mask = pci_msi_mask_irq,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_compose_msi_msg = hv_irq_compose_msi_msg,
> + .irq_set_affinity = msi_domain_set_affinity,
> + .flags = IRQCHIP_SKIP_SET_WAKE,
> +};
> +
> +static struct msi_domain_ops pci_msi_domain_ops = {
> + .domain_free_irqs = hv_msi_domain_free_irqs,
> + .msi_prepare = pci_msi_prepare,
> +};
> +
> +static struct msi_domain_info hv_pci_msi_domain_info = {
> + .flags = MSI_FLAG_USE_DEF_DOM_OPS |
> MSI_FLAG_USE_DEF_CHIP_OPS |
> + MSI_FLAG_PCI_MSIX,
> + .ops = &pci_msi_domain_ops,
> + .chip = &hv_pci_msi_controller,
> + .handler = handle_edge_irq,
> + .handler_name = "edge",
> +};
> +
> +struct irq_domain * __init hv_create_pci_msi_domain(void)
> +{
> + struct irq_domain *d = NULL;
> + struct fwnode_handle *fn;
> +
> + fn = irq_domain_alloc_named_fwnode("HV-PCI-MSI");
> + if (fn)
> + d = pci_msi_create_irq_domain(fn, &hv_pci_msi_domain_info, x86_vector_domain);
> +
> + /* No point in going further if we can't get an irq domain */
> + BUG_ON(!d);
> +
> + return d;
> +}
> +
> +#endif /* CONFIG_PCI_MSI */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index cbee72550a12..ccc849e25d5e 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -261,6 +261,8 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry
> *msi_entry,
> msi_entry->data.as_uint32 = msi_desc->msg.data;
> }
>
> +struct irq_domain *hv_create_pci_msi_domain(void);
> +
> #else /* CONFIG_HYPERV */
> static inline void hyperv_init(void) {}
> static inline void hyperv_setup_mmu_ops(void) {}
> --
> 2.20.1

2021-02-04 17:57:37

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

From: Wei Liu <[email protected]> Sent: Wednesday, February 3, 2021 7:05 AM
>
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
>
> Signed-off-by: Wei Liu <[email protected]>
> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
> arch/x86/hyperv/irqdomain.c | 25 +++++
> arch/x86/include/asm/mshyperv.h | 4 +
> drivers/iommu/hyperv-iommu.c | 177 +++++++++++++++++++++++++++++++-
> 3 files changed, 203 insertions(+), 3 deletions(-)
>

Reviewed-by: Michael Kelley <[email protected]>

2021-02-05 01:03:58

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 02/16] x86/hyperv: detect if Linux is the root partition

From: Wei Liu <[email protected]> Sent: Wednesday, February 3, 2021 7:04 AM
>
> For now we can use the privilege flag to check. Stash the value to be
> used later.
>
> Put in a bunch of defines for future use when we want to have more
> fine-grained detection.
>
> Signed-off-by: Wei Liu <[email protected]>
> Reviewed-by: Pavel Tatashin <[email protected]>
> ---
> v3: move hv_root_partition to mshyperv.c
> ---
> arch/x86/include/asm/hyperv-tlfs.h | 10 ++++++++++
> arch/x86/include/asm/mshyperv.h | 2 ++
> arch/x86/kernel/cpu/mshyperv.c | 20 ++++++++++++++++++++
> 3 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6bf42aed387e..204010350604 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -21,6 +21,7 @@
> #define HYPERV_CPUID_FEATURES 0x40000003
> #define HYPERV_CPUID_ENLIGHTMENT_INFO 0x40000004
> #define HYPERV_CPUID_IMPLEMENT_LIMITS 0x40000005
> +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES 0x40000007
> #define HYPERV_CPUID_NESTED_FEATURES 0x4000000A
>
> #define HYPERV_CPUID_VIRT_STACK_INTERFACE 0x40000081
> @@ -110,6 +111,15 @@
> /* Recommend using enlightened VMCS */
> #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)
>
> +/*
> + * CPU management features identification.
> + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> + */
> +#define HV_X64_START_LOGICAL_PROCESSOR BIT(0)
> +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR BIT(1)
> +#define HV_X64_PERFORMANCE_COUNTER_SYNC BIT(2)
> +#define HV_X64_RESERVED_IDENTITY_BIT BIT(31)
> +
> /*
> * Virtual processor will never share a physical core with another virtual
> * processor, except for virtual processors that are reported as sibling SMT
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..ac2b0d110f03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
> struct hv_guest_mapping_flush_list *flush,
> u64 start_gfn, u64 end_gfn);
>
> +extern bool hv_root_partition;
> +
> #ifdef CONFIG_X86_64
> void hv_apic_init(void);
> void __init hv_init_spinlocks(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f628e3dc150f..c376d191a260 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -32,6 +32,10 @@
> #include <asm/nmi.h>
> #include <clocksource/hyperv_timer.h>
>
> +/* Is Linux running as the root partition? */
> +bool hv_root_partition;
> +EXPORT_SYMBOL_GPL(hv_root_partition);
> +
> struct ms_hyperv_info ms_hyperv;
> EXPORT_SYMBOL_GPL(ms_hyperv);
>
> @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
> pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
> ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
>
> + /*
> + * Check CPU management privilege.
> + *
> + * To mirror what Windows does we should extract CPU management
> + * features and use the ReservedIdentityBit to detect if Linux is the
> + * root partition. But that requires negotiating CPU management
> + * interface (a process to be finalized).
> + *
> + * For now, use the privilege flag as the indicator for running as
> + * root.
> + */
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> + hv_root_partition = true;
> + pr_info("Hyper-V: running as root partition\n");
> + }
> +
> /*
> * Extract host information.
> */
> --
> 2.20.1

Reviewed-by: Michael Kelley <[email protected]>

2021-02-05 01:11:17

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

On Thu, Feb 04, 2021 at 05:43:16PM +0000, Michael Kelley wrote:
[...]
> > remove_cpuhp_state:
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > new file mode 100644
> > index 000000000000..117f17e8c88a
> > --- /dev/null
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -0,0 +1,362 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * for Linux to run as the root partition on Microsoft Hypervisor.
>
> Nit: Looks like the initial word "Irqdomain" got dropped from the above
> comment line. But don't respin just for this.
>

I've added it back. Thanks.

> > +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > + int cpu, int vector, struct hv_interrupt_entry *entry)
> > +{
> > + struct hv_input_map_device_interrupt *input;
> > + struct hv_output_map_device_interrupt *output;
> > + struct hv_device_interrupt_descriptor *intr_desc;
> > + unsigned long flags;
> > + u64 status;
> > + cpumask_t mask = CPU_MASK_NONE;
> > + int nr_bank, var_size;
> > +
> > + local_irq_save(flags);
> > +
> > + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +
> > + intr_desc = &input->interrupt_descriptor;
> > + memset(input, 0, sizeof(*input));
> > + input->partition_id = hv_current_partition_id;
> > + input->device_id = device_id.as_uint64;
> > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > + intr_desc->vector_count = 1;
> > + intr_desc->target.vector = vector;
> > +
> > + if (level)
> > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > + else
> > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > +
> > + cpumask_set_cpu(cpu, &mask);
> > + intr_desc->target.vp_set.valid_bank_mask = 0;
> > + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
>
> There's a function get_cpu_mask() that returns a pointer to a cpumask with only
> the specified cpu set in the mask. It returns a const pointer to the correct entry
> in a pre-allocated array of all such cpumasks, so it's a lot more efficient than
> allocating and initializing a local cpumask instance on the stack.
>

That's nice.

I've got the following diff to fix both issues. If you're happy with the
changes, can you give your Reviewed-by? That saves a round of posting.

diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 0cabc9aece38..fa71db798465 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0

/*
- * for Linux to run as the root partition on Microsoft Hypervisor.
+ * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
*
* Authors:
* Sunil Muthuswamy <[email protected]>
@@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
struct hv_device_interrupt_descriptor *intr_desc;
unsigned long flags;
u64 status;
- cpumask_t mask = CPU_MASK_NONE;
+ const cpumask_t *mask;
int nr_bank, var_size;

local_irq_save(flags);
@@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
else
intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;

- cpumask_set_cpu(cpu, &mask);
+ mask = cpumask_of(cpu);
intr_desc->target.vp_set.valid_bank_mask = 0;
intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
- nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
+ nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask);
if (nr_bank < 0) {
local_irq_restore(flags);
pr_err("%s: unable to generate VP set\n", __func__);

2021-02-05 01:32:44

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

On Wed, Feb 03, 2021 at 03:04:27PM +0000, Wei Liu wrote:
> There is already a stub function for pxm_to_node but conversion to the
> other direction is missing.
>
> It will be used by Microsoft Hypervisor code later.
>
> Signed-off-by: Wei Liu <[email protected]>

Hi ACPI maintainers, if you're happy with this patch I can take it via
the hyperv-next tree, given the issue is discovered when pxm_to_node is
called in our code.

> ---
> v6: new
> ---
> include/acpi/acpi_numa.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> index a4c6ef809e27..40a91ce87e04 100644
> --- a/include/acpi/acpi_numa.h
> +++ b/include/acpi/acpi_numa.h
> @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
> {
> return 0;
> }
> +static inline int node_to_pxm(int node)
> +{
> + return 0;
> +}
> #endif /* CONFIG_ACPI_NUMA */
>
> #ifdef CONFIG_ACPI_HMAT
> --
> 2.20.1
>

2021-02-05 01:32:52

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

From: Wei Liu <[email protected]> Sent: Thursday, February 4, 2021 9:57 AM
>
> On Thu, Feb 04, 2021 at 05:43:16PM +0000, Michael Kelley wrote:
> [...]
> > > remove_cpuhp_state:
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > new file mode 100644
> > > index 000000000000..117f17e8c88a
> > > --- /dev/null
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -0,0 +1,362 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +
> > > +/*
> > > + * for Linux to run as the root partition on Microsoft Hypervisor.
> >
> > Nit: Looks like the initial word "Irqdomain" got dropped from the above
> > comment line. But don't respin just for this.
> >
>
> I've added it back. Thanks.
>
> > > +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > > + int cpu, int vector, struct hv_interrupt_entry *entry)
> > > +{
> > > + struct hv_input_map_device_interrupt *input;
> > > + struct hv_output_map_device_interrupt *output;
> > > + struct hv_device_interrupt_descriptor *intr_desc;
> > > + unsigned long flags;
> > > + u64 status;
> > > + cpumask_t mask = CPU_MASK_NONE;
> > > + int nr_bank, var_size;
> > > +
> > > + local_irq_save(flags);
> > > +
> > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > +
> > > + intr_desc = &input->interrupt_descriptor;
> > > + memset(input, 0, sizeof(*input));
> > > + input->partition_id = hv_current_partition_id;
> > > + input->device_id = device_id.as_uint64;
> > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > + intr_desc->vector_count = 1;
> > > + intr_desc->target.vector = vector;
> > > +
> > > + if (level)
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > + else
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > + cpumask_set_cpu(cpu, &mask);
> > > + intr_desc->target.vp_set.valid_bank_mask = 0;
> > > + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
> >
> > There's a function get_cpu_mask() that returns a pointer to a cpumask with only
> > the specified cpu set in the mask. It returns a const pointer to the correct entry
> > in a pre-allocated array of all such cpumasks, so it's a lot more efficient than
> > allocating and initializing a local cpumask instance on the stack.
> >
>
> That's nice.
>
> I've got the following diff to fix both issues. If you're happy with the
> changes, can you give your Reviewed-by? That saves a round of posting.
>
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 0cabc9aece38..fa71db798465 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -1,7 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
>
> /*
> - * for Linux to run as the root partition on Microsoft Hypervisor.
> + * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
> *
> * Authors:
> * Sunil Muthuswamy <[email protected]>
> @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> struct hv_device_interrupt_descriptor *intr_desc;
> unsigned long flags;
> u64 status;
> - cpumask_t mask = CPU_MASK_NONE;
> + const cpumask_t *mask;
> int nr_bank, var_size;
>
> local_irq_save(flags);
> @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, bool
> level,
> else
> intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
>
> - cpumask_set_cpu(cpu, &mask);
> + mask = cpumask_of(cpu);
> intr_desc->target.vp_set.valid_bank_mask = 0;
> intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> - nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
> + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask);

Can you just do the following and get rid of the 'mask' local entirely?

nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));

Either way,

Reviewed-by: Michael Kelley <[email protected]>

> if (nr_bank < 0) {
> local_irq_restore(flags);
> pr_err("%s: unable to generate VP set\n", __func__);

2021-02-05 01:33:02

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

On Thu, Feb 04, 2021 at 06:40:55PM +0000, Michael Kelley wrote:
> From: Wei Liu <[email protected]> Sent: Thursday, February 4, 2021 9:57 AM
[...]
> > I've got the following diff to fix both issues. If you're happy with the
> > changes, can you give your Reviewed-by? That saves a round of posting.
> >
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 0cabc9aece38..fa71db798465 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -1,7 +1,7 @@
> > // SPDX-License-Identifier: GPL-2.0
> >
> > /*
> > - * for Linux to run as the root partition on Microsoft Hypervisor.
> > + * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
> > *
> > * Authors:
> > * Sunil Muthuswamy <[email protected]>
> > @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > struct hv_device_interrupt_descriptor *intr_desc;
> > unsigned long flags;
> > u64 status;
> > - cpumask_t mask = CPU_MASK_NONE;
> > + const cpumask_t *mask;
> > int nr_bank, var_size;
> >
> > local_irq_save(flags);
> > @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, bool
> > level,
> > else
> > intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> >
> > - cpumask_set_cpu(cpu, &mask);
> > + mask = cpumask_of(cpu);
> > intr_desc->target.vp_set.valid_bank_mask = 0;
> > intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > - nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
> > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask);
>
> Can you just do the following and get rid of the 'mask' local entirely?
>
> nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));

Sure. That can be done.

>
> Either way,
>
> Reviewed-by: Michael Kelley <[email protected]>

Thank you.

Wei.

2021-02-05 01:35:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

On Thu, Feb 4, 2021 at 7:41 PM Wei Liu <[email protected]> wrote:
>
> On Wed, Feb 03, 2021 at 03:04:27PM +0000, Wei Liu wrote:
> > There is already a stub function for pxm_to_node but conversion to the
> > other direction is missing.
> >
> > It will be used by Microsoft Hypervisor code later.
> >
> > Signed-off-by: Wei Liu <[email protected]>
>
> Hi ACPI maintainers, if you're happy with this patch I can take it via
> the hyperv-next tree, given the issue is discovered when pxm_to_node is
> called in our code.

Yes, you can.

Thanks!

>
> > ---
> > v6: new
> > ---
> > include/acpi/acpi_numa.h | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> > index a4c6ef809e27..40a91ce87e04 100644
> > --- a/include/acpi/acpi_numa.h
> > +++ b/include/acpi/acpi_numa.h
> > @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
> > {
> > return 0;
> > }
> > +static inline int node_to_pxm(int node)
> > +{
> > + return 0;
> > +}
> > #endif /* CONFIG_ACPI_NUMA */
> >
> > #ifdef CONFIG_ACPI_HMAT
> > --
> > 2.20.1
> >

2021-02-05 01:35:33

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

On Thu, Feb 04, 2021 at 07:45:25PM +0100, Rafael J. Wysocki wrote:
> On Thu, Feb 4, 2021 at 7:41 PM Wei Liu <[email protected]> wrote:
> >
> > On Wed, Feb 03, 2021 at 03:04:27PM +0000, Wei Liu wrote:
> > > There is already a stub function for pxm_to_node but conversion to the
> > > other direction is missing.
> > >
> > > It will be used by Microsoft Hypervisor code later.
> > >
> > > Signed-off-by: Wei Liu <[email protected]>
> >
> > Hi ACPI maintainers, if you're happy with this patch I can take it via
> > the hyperv-next tree, given the issue is discovered when pxm_to_node is
> > called in our code.
>
> Yes, you can.

Thanks Rafael. I will add your ack to the patch as well.

Wei.

2021-02-05 01:45:53

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v6 00/16] Introducing Linux root partition support for Microsoft Hypervisor

On Wed, Feb 03, 2021 at 03:04:19PM +0000, Wei Liu wrote:
> Wei Liu (16):
> asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to
> HV_CPU_MANAGEMENT
> x86/hyperv: detect if Linux is the root partition
> Drivers: hv: vmbus: skip VMBus initialization if Linux is root
> clocksource/hyperv: use MSR-based access if running as root
> x86/hyperv: allocate output arg pages if required
> x86/hyperv: extract partition ID from Microsoft Hypervisor if
> necessary
> x86/hyperv: handling hypercall page setup for root
> ACPI / NUMA: add a stub function for node_to_pxm()
> x86/hyperv: provide a bunch of helper functions
> x86/hyperv: implement and use hv_smp_prepare_cpus
> asm-generic/hyperv: update hv_msi_entry
> asm-generic/hyperv: update hv_interrupt_entry
> asm-generic/hyperv: introduce hv_device_id and auxiliary structures
> asm-generic/hyperv: import data structures for mapping device
> interrupts
> x86/hyperv: implement an MSI domain for root partition
> iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

This series is now rebased and pushed to hyperv-next.

Many thanks to all the people that provided reviews and comments.

Wei.