2020-11-05 17:04:23

by Wei Liu

[permalink] [raw]
Subject: [PATCH v2 00/17] Introducing Linux root partition support for Microsoft Hypervisor

Hi all

Here we propose this patch series to make Linux run as the root partition [0]
on Microsoft Hypervisor [1]. There will be a subsequent patch series to provide a
device node (/dev/mshv) such that userspace programs can create and run virtual
machines. We've also ported Cloud Hypervisor [3] over and have been able to
boot a Linux guest with Virtio devices since late July.

Being an RFC sereis, this implements only the absolutely necessary
components to get things running. I will break down this series a bit.

A large portion of this series consists of patches that augment hyperv-tlfs.h.
They should be rather uncontroversial and can be applied right away.

A few key things other than the changes to hyperv-tlfs.h:

1. Linux needs to setup existing Hyper-V facilities differently.
2. Linux needs to make a few hypercalls to bring up APs.
3. Interrupts are remapped by IOMMU, which is controlled by the hypervisor.
Linux needs to make hypercalls to map and unmap interrupts. This is
done by introducing a new MSI irqdomain and new irqchips.

This series is now based on 5.10-rc1. And thanks to tglx's overhaul of
the MSI code, our implementation of the MSI irq domain is shorter.

Comments and suggestions are welcome.

Thanks,
Wei.

Changes since v1:
1. Simplify MSI IRQ domain implementation.
2. Address Vitaly's comments.

Wei Liu (17):
asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to
HV_CPU_MANAGEMENT
x86/hyperv: detect if Linux is the root partition
Drivers: hv: vmbus: skip VMBus initialization if Linux is root
iommu/hyperv: don't setup IRQ remapping when running as root
clocksource/hyperv: use MSR-based access if running as root
x86/hyperv: allocate output arg pages if required
x86/hyperv: extract partition ID from Microsoft Hypervisor if
necessary
x86/hyperv: handling hypercall page setup for root
x86/hyperv: provide a bunch of helper functions
x86/hyperv: implement and use hv_smp_prepare_cpus
asm-generic/hyperv: update hv_msi_entry
asm-generic/hyperv: update hv_interrupt_entry
asm-generic/hyperv: introduce hv_device_id and auxiliary structures
asm-generic/hyperv: import data structures for mapping device
interrupts
x86/hyperv: implement an MSI domain for root partition
x86/ioapic: export a few functions and data structures via io_apic.h
x86/hyperv: handle IO-APIC when running as root

arch/x86/hyperv/Makefile | 2 +-
arch/x86/hyperv/hv_init.c | 118 +++++-
arch/x86/hyperv/hv_proc.c | 217 +++++++++++
arch/x86/hyperv/irqdomain.c | 556 ++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 23 ++
arch/x86/include/asm/io_apic.h | 21 ++
arch/x86/include/asm/mshyperv.h | 13 +-
arch/x86/kernel/apic/io_apic.c | 28 +-
arch/x86/kernel/cpu/mshyperv.c | 43 +++
drivers/clocksource/hyperv_timer.c | 3 +
drivers/hv/vmbus_drv.c | 3 +
drivers/iommu/hyperv-iommu.c | 3 +-
drivers/pci/controller/pci-hyperv.c | 2 +-
include/asm-generic/hyperv-tlfs.h | 254 ++++++++++++-
14 files changed, 1250 insertions(+), 36 deletions(-)
create mode 100644 arch/x86/hyperv/hv_proc.c
create mode 100644 arch/x86/hyperv/irqdomain.c

--
2.20.1


2020-11-05 17:04:45

by Wei Liu

[permalink] [raw]
Subject: [PATCH v2 06/17] x86/hyperv: allocate output arg pages if required

When Linux runs as the root partition, it will need to make hypercalls
which return data from the hypervisor.

Allocate pages for storing results when Linux runs as the root
partition.

Signed-off-by: Lillian Grassin-Drake <[email protected]>
Co-Developed-by: Lillian Grassin-Drake <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
---
v2: Address Vitaly's comments
---
arch/x86/hyperv/hv_init.c | 35 ++++++++++++++++++++++++++++-----
arch/x86/include/asm/mshyperv.h | 1 +
2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 533fe9e887f2..7a2e37f025b0 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -45,6 +45,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
void __percpu **hyperv_pcpu_input_arg;
EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);

+void __percpu **hyperv_pcpu_output_arg;
+EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
+
u32 hv_max_vp_index;
EXPORT_SYMBOL_GPL(hv_max_vp_index);

@@ -77,12 +80,19 @@ static int hv_cpu_init(unsigned int cpu)
void **input_arg;
struct page *pg;

- input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
- pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
+ pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
if (unlikely(!pg))
return -ENOMEM;
+
+ input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
*input_arg = page_address(pg);
+ if (hv_root_partition) {
+ void **output_arg;
+
+ output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+ *output_arg = page_address(pg + 1);
+ }

hv_get_vp_index(msr_vp_index);

@@ -209,14 +219,23 @@ static int hv_cpu_die(unsigned int cpu)
unsigned int new_cpu;
unsigned long flags;
void **input_arg;
- void *input_pg = NULL;
+ void *pg;

local_irq_save(flags);
input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
- input_pg = *input_arg;
+ pg = *input_arg;
*input_arg = NULL;
+
+ if (hv_root_partition) {
+ void **output_arg;
+
+ output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+ *output_arg = NULL;
+ }
+
local_irq_restore(flags);
- free_page((unsigned long)input_pg);
+
+ free_page((unsigned long)pg);

if (hv_vp_assist_page && hv_vp_assist_page[cpu])
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
@@ -350,6 +369,12 @@ void __init hyperv_init(void)

BUG_ON(hyperv_pcpu_input_arg == NULL);

+ /* Allocate the per-CPU state for output arg for root */
+ if (hv_root_partition) {
+ hyperv_pcpu_output_arg = alloc_percpu(void *);
+ BUG_ON(hyperv_pcpu_output_arg == NULL);
+ }
+
/* Allocate percpu VP index */
hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
GFP_KERNEL);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ac2b0d110f03..62d9390f1ddf 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
#if IS_ENABLED(CONFIG_HYPERV)
extern void *hv_hypercall_pg;
extern void __percpu **hyperv_pcpu_input_arg;
+extern void __percpu **hyperv_pcpu_output_arg;

static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
{
--
2.20.1

2020-11-12 15:38:22

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 06/17] x86/hyperv: allocate output arg pages if required

Wei Liu <[email protected]> writes:

> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
>
> Allocate pages for storing results when Linux runs as the root
> partition.
>
> Signed-off-by: Lillian Grassin-Drake <[email protected]>
> Co-Developed-by: Lillian Grassin-Drake <[email protected]>
> Signed-off-by: Wei Liu <[email protected]>
> ---
> v2: Address Vitaly's comments
> ---
> arch/x86/hyperv/hv_init.c | 35 ++++++++++++++++++++++++++++-----
> arch/x86/include/asm/mshyperv.h | 1 +
> 2 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 533fe9e887f2..7a2e37f025b0 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -45,6 +45,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> void __percpu **hyperv_pcpu_input_arg;
> EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>
> +void __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
> u32 hv_max_vp_index;
> EXPORT_SYMBOL_GPL(hv_max_vp_index);
>
> @@ -77,12 +80,19 @@ static int hv_cpu_init(unsigned int cpu)
> void **input_arg;
> struct page *pg;
>
> - input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> + pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0);
> if (unlikely(!pg))
> return -ENOMEM;
> +
> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> *input_arg = page_address(pg);
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = page_address(pg + 1);
> + }
>
> hv_get_vp_index(msr_vp_index);
>
> @@ -209,14 +219,23 @@ static int hv_cpu_die(unsigned int cpu)
> unsigned int new_cpu;
> unsigned long flags;
> void **input_arg;
> - void *input_pg = NULL;
> + void *pg;
>
> local_irq_save(flags);
> input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> - input_pg = *input_arg;
> + pg = *input_arg;
> *input_arg = NULL;
> +
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = NULL;
> + }
> +
> local_irq_restore(flags);
> - free_page((unsigned long)input_pg);
> +
> + free_page((unsigned long)pg);
>

Hm, but in case we've allocated output_arg, don't we need to do
free_pages((unsigned long)pg, 1);

instead?

> if (hv_vp_assist_page && hv_vp_assist_page[cpu])
> wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -350,6 +369,12 @@ void __init hyperv_init(void)
>
> BUG_ON(hyperv_pcpu_input_arg == NULL);
>
> + /* Allocate the per-CPU state for output arg for root */
> + if (hv_root_partition) {
> + hyperv_pcpu_output_arg = alloc_percpu(void *);
> + BUG_ON(hyperv_pcpu_output_arg == NULL);
> + }
> +
> /* Allocate percpu VP index */
> hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
> #if IS_ENABLED(CONFIG_HYPERV)
> extern void *hv_hypercall_pg;
> extern void __percpu **hyperv_pcpu_input_arg;
> +extern void __percpu **hyperv_pcpu_output_arg;
>
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {

--
Vitaly

2020-11-13 15:09:29

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v2 06/17] x86/hyperv: allocate output arg pages if required

On Thu, Nov 12, 2020 at 04:35:48PM +0100, Vitaly Kuznetsov wrote:
> Wei Liu <[email protected]> writes:
[...]
> > @@ -209,14 +219,23 @@ static int hv_cpu_die(unsigned int cpu)
> > unsigned int new_cpu;
> > unsigned long flags;
> > void **input_arg;
> > - void *input_pg = NULL;
> > + void *pg;
> >
> > local_irq_save(flags);
> > input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> > - input_pg = *input_arg;
> > + pg = *input_arg;
> > *input_arg = NULL;
> > +
> > + if (hv_root_partition) {
> > + void **output_arg;
> > +
> > + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> > + *output_arg = NULL;
> > + }
> > +
> > local_irq_restore(flags);
> > - free_page((unsigned long)input_pg);
> > +
> > + free_page((unsigned long)pg);
> >
>
> Hm, but in case we've allocated output_arg, don't we need to do
> free_pages((unsigned long)pg, 1);
>
> instead?

Indeed. This has been fixed with:

free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);

Wei.