Hi,
This is the RFC patchset for vDSO support in ARM64 Hyper-V guest. To
test it, Michael's ARM64 support patchset:
https://lore.kernel.org/linux-arm-kernel/[email protected]/
is needed.
Similar as x86, Hyper-V on ARM64 use a TSC page for guests to read
the virtualized hardware timer, this TSC page is read-only for the
guests, so could be used for vDSO data page. And the vDSO (userspace)
code could use the same code for timer reading as kernel, since
they read the same TSC page.
This patchset therefore extends ARM64's __vsdo_init() to allow multiple
data pages and introduces the vclock_mode concept similar to x86 to
allow different platforms (bare-metal, Hyper-V, etc.) to switch to
different __arch_get_hw_counter() implementations. The rest of this
patchset does the necessary setup for Hyper-V guests: mapping tsc page,
enabling userspace to read cntvct, etc. to enable vDSO.
This patchset consists of 6 patches:
patch #1 allows hv_get_raw_timer() definition to be overridden for
userspace and kernel to share the same hv_read_tsc_page() definition.
patch #2 extends ARM64 to support multiple vDSO data pages.
patch #3 introduces vclock_mode similiar to x86 to allow different
__arch_get_hw_counter() implementations for different clocksources.
patch #4 maps Hyper-V TSC page into vDSO data page.
patch #5 allows userspace to read cntvct, so that userspace can
efficiently read the clocksource.
patch #6 enables the vDSO for ARM64 Hyper-V guest.
The whole patchset is based on v5.5-rc1 plus Michael's ARM64 support
patchset, and I've done a few tests with:
https://github.com/nlynch-mentor/vdsotest
Comments and suggestions are welcome!
Regards,
Boqun
Split __vdso_abi::vdso_pages into nr_vdso_{data,code}_pages, so that
__setup_additional_pages() could work with multiple vDSO data pages with
the setup from __vdso_init().
Multiple vDSO data pages are required when running in a virtualized
environment, where the cycles read from cntvct at userspace need to
be adjusted with some data from a page maintained by the hypervisor. For
example, the TSC page in Hyper-V.
This is a prerequisite for vDSO support in ARM64 on Hyper-V.
Signed-off-by: Boqun Feng (Microsoft) <[email protected]>
---
arch/arm64/kernel/vdso.c | 43 ++++++++++++++++++++++++----------------
1 file changed, 26 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 354b11e27c07..b9b5ec7a3084 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -50,7 +50,8 @@ struct __vdso_abi {
const char *name;
const char *vdso_code_start;
const char *vdso_code_end;
- unsigned long vdso_pages;
+ unsigned long nr_vdso_data_pages;
+ unsigned long nr_vdso_code_pages;
/* Data Mapping */
struct vm_special_mapping *dm;
/* Code Mapping */
@@ -101,6 +102,8 @@ static int __vdso_init(enum arch_vdso_type arch_index)
{
int i;
struct page **vdso_pagelist;
+ struct page **vdso_code_pagelist;
+ unsigned long nr_vdso_pages;
unsigned long pfn;
if (memcmp(vdso_lookup[arch_index].vdso_code_start, "\177ELF", 4)) {
@@ -108,14 +111,18 @@ static int __vdso_init(enum arch_vdso_type arch_index)
return -EINVAL;
}
- vdso_lookup[arch_index].vdso_pages = (
+ vdso_lookup[arch_index].nr_vdso_data_pages = 1;
+
+ vdso_lookup[arch_index].nr_vdso_code_pages = (
vdso_lookup[arch_index].vdso_code_end -
vdso_lookup[arch_index].vdso_code_start) >>
PAGE_SHIFT;
- /* Allocate the vDSO pagelist, plus a page for the data. */
- vdso_pagelist = kcalloc(vdso_lookup[arch_index].vdso_pages + 1,
- sizeof(struct page *),
+ nr_vdso_pages = vdso_lookup[arch_index].nr_vdso_data_pages +
+ vdso_lookup[arch_index].nr_vdso_code_pages;
+
+ /* Allocate the vDSO pagelist. */
+ vdso_pagelist = kcalloc(nr_vdso_pages, sizeof(struct page *),
GFP_KERNEL);
if (vdso_pagelist == NULL)
return -ENOMEM;
@@ -123,15 +130,17 @@ static int __vdso_init(enum arch_vdso_type arch_index)
/* Grab the vDSO data page. */
vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data));
-
/* Grab the vDSO code pages. */
pfn = sym_to_pfn(vdso_lookup[arch_index].vdso_code_start);
- for (i = 0; i < vdso_lookup[arch_index].vdso_pages; i++)
- vdso_pagelist[i + 1] = pfn_to_page(pfn + i);
+ vdso_code_pagelist = vdso_pagelist +
+ vdso_lookup[arch_index].nr_vdso_data_pages;
+
+ for (i = 0; i < vdso_lookup[arch_index].nr_vdso_code_pages; i++)
+ vdso_code_pagelist[i] = pfn_to_page(pfn + i);
- vdso_lookup[arch_index].dm->pages = &vdso_pagelist[0];
- vdso_lookup[arch_index].cm->pages = &vdso_pagelist[1];
+ vdso_lookup[arch_index].dm->pages = vdso_pagelist;
+ vdso_lookup[arch_index].cm->pages = vdso_code_pagelist;
return 0;
}
@@ -141,26 +150,26 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index,
struct linux_binprm *bprm,
int uses_interp)
{
- unsigned long vdso_base, vdso_text_len, vdso_mapping_len;
+ unsigned long vdso_base, vdso_text_len, vdso_data_len;
void *ret;
- vdso_text_len = vdso_lookup[arch_index].vdso_pages << PAGE_SHIFT;
- /* Be sure to map the data page */
- vdso_mapping_len = vdso_text_len + PAGE_SIZE;
+ vdso_data_len = vdso_lookup[arch_index].nr_vdso_data_pages << PAGE_SHIFT;
+ vdso_text_len = vdso_lookup[arch_index].nr_vdso_code_pages << PAGE_SHIFT;
- vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
+ vdso_base = get_unmapped_area(NULL, 0,
+ vdso_data_len + vdso_text_len, 0, 0);
if (IS_ERR_VALUE(vdso_base)) {
ret = ERR_PTR(vdso_base);
goto up_fail;
}
- ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
+ ret = _install_special_mapping(mm, vdso_base, vdso_data_len,
VM_READ|VM_MAYREAD,
vdso_lookup[arch_index].dm);
if (IS_ERR(ret))
goto up_fail;
- vdso_base += PAGE_SIZE;
+ vdso_base += vdso_data_len;
mm->context.vdso = (void *)vdso_base;
ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
VM_READ|VM_EXEC|
--
2.24.0
Similar to x86, use a vclock_mode in arch_clocksource_data to differ
clocksoures use different read function in vDSO.
No functional changes, only preparation for support vDSO in ARM64 on
Hyper-V.
Note: the changes for arm are only because arm and arm64 share the same
code in the arch timer driver and require arch_clocksource_data having
the same field.
Signed-off-by: Boqun Feng (Microsoft) <[email protected]>
---
arch/arm/include/asm/clocksource.h | 6 +++++-
arch/arm/kernel/vdso.c | 1 -
arch/arm64/include/asm/clocksource.h | 6 +++++-
arch/arm64/include/asm/mshyperv.h | 2 +-
arch/arm64/include/asm/vdso/compat_gettimeofday.h | 5 +++--
arch/arm64/include/asm/vdso/gettimeofday.h | 5 +++--
arch/arm64/include/asm/vdso/vsyscall.h | 4 +---
drivers/clocksource/arm_arch_timer.c | 8 ++++----
8 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/arch/arm/include/asm/clocksource.h b/arch/arm/include/asm/clocksource.h
index 0b350a7e26f3..017c5ab6e587 100644
--- a/arch/arm/include/asm/clocksource.h
+++ b/arch/arm/include/asm/clocksource.h
@@ -1,8 +1,12 @@
#ifndef _ASM_CLOCKSOURCE_H
#define _ASM_CLOCKSOURCE_H
+#define VCLOCK_NONE 0 /* No vDSO clock available. */
+#define VCLOCK_CNTVCT 1 /* vDSO should use cntvcnt */
+#define VCLOCK_MAX 1
+
struct arch_clocksource_data {
- bool vdso_direct; /* Usable for direct VDSO access? */
+ int vclock_mode;
};
#endif
diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c
index c89ac1b9d28b..09e46ec420fe 100644
--- a/arch/arm/kernel/vdso.c
+++ b/arch/arm/kernel/vdso.c
@@ -263,4 +263,3 @@ void arm_install_vdso(struct mm_struct *mm, unsigned long addr)
if (!IS_ERR(vma))
mm->context.vdso = addr;
}
-
diff --git a/arch/arm64/include/asm/clocksource.h b/arch/arm64/include/asm/clocksource.h
index 0ece64a26c8c..fbe80057468c 100644
--- a/arch/arm64/include/asm/clocksource.h
+++ b/arch/arm64/include/asm/clocksource.h
@@ -2,8 +2,12 @@
#ifndef _ASM_CLOCKSOURCE_H
#define _ASM_CLOCKSOURCE_H
+#define VCLOCK_NONE 0 /* No vDSO clock available. */
+#define VCLOCK_CNTVCT 1 /* vDSO should use cntvcnt */
+#define VCLOCK_MAX 1
+
struct arch_clocksource_data {
- bool vdso_direct; /* Usable for direct VDSO access? */
+ int vclock_mode;
};
#endif
diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index 9cc4aeddf2d0..0afb00e3501d 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -90,7 +90,7 @@ extern void hv_get_vpreg_128(u32 reg, struct hv_get_vp_register_output *result);
#define hv_set_reference_tsc(val) \
hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, val)
#define hv_set_clocksource_vdso(val) \
- ((val).archdata.vdso_direct = false)
+ ((val).archdata.vclock_mode = VCLOCK_NONE)
#if IS_ENABLED(CONFIG_HYPERV)
#define hv_enable_stimer0_percpu_irq(irq) enable_percpu_irq(irq, 0)
diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index c50ee1b7d5cd..630d04c3c92e 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -8,6 +8,7 @@
#ifndef __ASSEMBLY__
#include <asm/unistd.h>
+#include <asm/clocksource.h>
#include <uapi/linux/time.h>
#include <asm/vdso/compat_barrier.h>
@@ -117,10 +118,10 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
u64 res;
/*
- * clock_mode == 0 implies that vDSO are enabled otherwise
+ * clock_mode == VCLOCK_NONE implies that vDSO are disabled so
* fallback on syscall.
*/
- if (clock_mode)
+ if (clock_mode == VCLOCK_NONE)
return __VDSO_USE_SYSCALL;
/*
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index b08f476b72b4..e6e3fe0488c7 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -8,6 +8,7 @@
#ifndef __ASSEMBLY__
#include <asm/unistd.h>
+#include <asm/clocksource.h>
#include <uapi/linux/time.h>
#define __VDSO_USE_SYSCALL ULLONG_MAX
@@ -71,10 +72,10 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
u64 res;
/*
- * clock_mode == 0 implies that vDSO are enabled otherwise
+ * clock_mode == VCLOCK_NONE implies that vDSO are disabled so
* fallback on syscall.
*/
- if (clock_mode)
+ if (clock_mode == VCLOCK_NONE)
return __VDSO_USE_SYSCALL;
/*
diff --git a/arch/arm64/include/asm/vdso/vsyscall.h b/arch/arm64/include/asm/vdso/vsyscall.h
index 0c20a7c1bee5..07f78b0da498 100644
--- a/arch/arm64/include/asm/vdso/vsyscall.h
+++ b/arch/arm64/include/asm/vdso/vsyscall.h
@@ -24,9 +24,7 @@ struct vdso_data *__arm64_get_k_vdso_data(void)
static __always_inline
int __arm64_get_clock_mode(struct timekeeper *tk)
{
- u32 use_syscall = !tk->tkr_mono.clock->archdata.vdso_direct;
-
- return use_syscall;
+ return tk->tkr_mono.clock->archdata.vclock_mode;
}
#define __arch_get_clock_mode __arm64_get_clock_mode
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9a5464c625b4..9b8d4d00b53b 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -69,7 +69,7 @@ static enum arch_timer_ppi_nr arch_timer_uses_ppi = ARCH_TIMER_VIRT_PPI;
static bool arch_timer_c3stop;
static bool arch_timer_mem_use_virtual;
static bool arch_counter_suspend_stop;
-static bool vdso_default = true;
+static int vdso_default = VCLOCK_CNTVCT;
static cpumask_t evtstrm_available = CPU_MASK_NONE;
static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
@@ -560,8 +560,8 @@ void arch_timer_enable_workaround(const struct arch_timer_erratum_workaround *wa
* change both the default value and the vdso itself.
*/
if (wa->read_cntvct_el0) {
- clocksource_counter.archdata.vdso_direct = false;
- vdso_default = false;
+ clocksource_counter.archdata.vclock_mode = VCLOCK_NONE;
+ vdso_default = VCLOCK_NONE;
}
}
@@ -979,7 +979,7 @@ static void __init arch_counter_register(unsigned type)
}
arch_timer_read_counter = rd;
- clocksource_counter.archdata.vdso_direct = vdso_default;
+ clocksource_counter.archdata.vclock_mode = vdso_default;
} else {
arch_timer_read_counter = arch_counter_get_cntvct_mem;
}
--
2.24.0
On Hyper-V, a tsc page has the data for adjusting cntvct numbers to
clocksource cycles, and that's how Hyper-V guest kernel reads the
clocksource. In order to allow userspace to read the same clocksource
directly, the tsc page has to been mapped into userspace via vDSO.
Use the framework for vDSO set-up in __vdso_init() to do this.
Note: if HYPERV_TIMER=y but the kernel is using other clocksource or
doesn't have the hyperv timer clocksource, tsc page will still be mapped
into userspace.
Signed-off-by: Boqun Feng (Microsoft) <[email protected]>
---
arch/arm64/kernel/vdso.c | 12 ++++++++++++
arch/arm64/kernel/vdso/vdso.lds.S | 12 +++++++++++-
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index b9b5ec7a3084..18a634987bdc 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -9,6 +9,7 @@
#include <linux/cache.h>
#include <linux/clocksource.h>
+#include <clocksource/hyperv_timer.h>
#include <linux/elf.h>
#include <linux/err.h>
#include <linux/errno.h>
@@ -105,14 +106,22 @@ static int __vdso_init(enum arch_vdso_type arch_index)
struct page **vdso_code_pagelist;
unsigned long nr_vdso_pages;
unsigned long pfn;
+ struct ms_hyperv_tsc_page *tsc_page;
+ int tsc_page_idx;
if (memcmp(vdso_lookup[arch_index].vdso_code_start, "\177ELF", 4)) {
pr_err("vDSO is not a valid ELF object!\n");
return -EINVAL;
}
+ /* One vDSO data page */
vdso_lookup[arch_index].nr_vdso_data_pages = 1;
+ /* Grab the Hyper-V tsc page, if enabled, add one more page */
+ tsc_page = hv_get_tsc_page();
+ if (tsc_page)
+ tsc_page_idx = vdso_lookup[arch_index].nr_vdso_data_pages++;
+
vdso_lookup[arch_index].nr_vdso_code_pages = (
vdso_lookup[arch_index].vdso_code_end -
vdso_lookup[arch_index].vdso_code_start) >>
@@ -130,6 +139,9 @@ static int __vdso_init(enum arch_vdso_type arch_index)
/* Grab the vDSO data page. */
vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data));
+ if (tsc_page)
+ vdso_pagelist[tsc_page_idx] = phys_to_page(__pa(tsc_page));
+
/* Grab the vDSO code pages. */
pfn = sym_to_pfn(vdso_lookup[arch_index].vdso_code_start);
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 7ad2d3a0cd48..e40a1f5a6d30 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -17,7 +17,17 @@ OUTPUT_ARCH(aarch64)
SECTIONS
{
- PROVIDE(_vdso_data = . - PAGE_SIZE);
+ /*
+ * vdso data pages:
+ * vdso data (1 page)
+ * hv tsc page (1 page if enabled)
+ */
+ PROVIDE(_vdso_data = _hvclock_page - PAGE_SIZE);
+#ifdef CONFIG_HYPERV_TIMER
+ PROVIDE(_hvclock_page = . - PAGE_SIZE);
+#else
+ PROVIDE(_hvclock_page = .);
+#endif
. = VDSO_LBASE + SIZEOF_HEADERS;
.hash : { *(.hash) } :text
--
2.24.0
Since reading hyperv-timer clocksource requires reading cntvct,
userspace should be allowed to read it, otherwise reading cntvct will
result in traps, which makes vsyscall's cost similar compared to
syscall's.
So enable it on every cpu when a Hyper-V guest booting up.
Signed-off-by: Boqun Feng (Microsoft) <[email protected]>
---
arch/arm64/hyperv/hv_init.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/arm64/hyperv/hv_init.c b/arch/arm64/hyperv/hv_init.c
index 86e4621d5885..1ea97ecfb143 100644
--- a/arch/arm64/hyperv/hv_init.c
+++ b/arch/arm64/hyperv/hv_init.c
@@ -27,6 +27,7 @@
#include <linux/sched_clock.h>
#include <asm-generic/bug.h>
#include <asm/hyperv-tlfs.h>
+#include <asm/arch_timer.h>
#include <asm/mshyperv.h>
#include <asm/sysreg.h>
#include <clocksource/hyperv_timer.h>
@@ -45,6 +46,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
static int hv_cpu_init(unsigned int cpu)
{
u64 msr_vp_index;
+ u32 cntkctl;
hv_get_vp_index(msr_vp_index);
@@ -53,6 +55,11 @@ static int hv_cpu_init(unsigned int cpu)
if (msr_vp_index > hv_max_vp_index)
hv_max_vp_index = msr_vp_index;
+ /* Enable EL0 to access cntvct */
+ cntkctl = arch_timer_get_cntkctl();
+ cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN;
+ arch_timer_set_cntkctl(cntkctl);
+
return 0;
}
--
2.24.0
Similar to x86, add a new vclock_mode VCLOCK_HVCLOCK, and reuse the
hv_read_tsc_page() for userspace to read tsc page clocksource.
Signed-off-by: Boqun Feng (Microsoft) <[email protected]>
---
arch/arm64/include/asm/clocksource.h | 3 ++-
arch/arm64/include/asm/mshyperv.h | 2 +-
arch/arm64/include/asm/vdso/gettimeofday.h | 19 +++++++++++++++++++
3 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/clocksource.h b/arch/arm64/include/asm/clocksource.h
index fbe80057468c..c6acd45fe748 100644
--- a/arch/arm64/include/asm/clocksource.h
+++ b/arch/arm64/include/asm/clocksource.h
@@ -4,7 +4,8 @@
#define VCLOCK_NONE 0 /* No vDSO clock available. */
#define VCLOCK_CNTVCT 1 /* vDSO should use cntvcnt */
-#define VCLOCK_MAX 1
+#define VCLOCK_HVCLOCK 2 /* vDSO should use vread_hvclock() */
+#define VCLOCK_MAX 2
struct arch_clocksource_data {
int vclock_mode;
diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index 0afb00e3501d..7c85dd816dca 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -90,7 +90,7 @@ extern void hv_get_vpreg_128(u32 reg, struct hv_get_vp_register_output *result);
#define hv_set_reference_tsc(val) \
hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, val)
#define hv_set_clocksource_vdso(val) \
- ((val).archdata.vclock_mode = VCLOCK_NONE)
+ ((val).archdata.vclock_mode = VCLOCK_HVCLOCK)
#if IS_ENABLED(CONFIG_HYPERV)
#define hv_enable_stimer0_percpu_irq(irq) enable_percpu_irq(irq, 0)
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index e6e3fe0488c7..7e689b903f4d 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -67,6 +67,20 @@ int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
return ret;
}
+#ifdef CONFIG_HYPERV_TIMER
+/* This will override the default hv_get_raw_timer() */
+#define hv_get_raw_timer() __arch_counter_get_cntvct()
+#include <clocksource/hyperv_timer.h>
+
+extern struct ms_hyperv_tsc_page
+_hvclock_page __attribute__((visibility("hidden")));
+
+static u64 vread_hvclock(void)
+{
+ return hv_read_tsc_page(&_hvclock_page);
+}
+#endif
+
static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
{
u64 res;
@@ -78,6 +92,11 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
if (clock_mode == VCLOCK_NONE)
return __VDSO_USE_SYSCALL;
+#ifdef CONFIG_HYPERV_TIMER
+ if (likely(clock_mode == VCLOCK_HVCLOCK))
+ return vread_hvclock();
+#endif
+
/*
* This isb() is required to prevent that the counter value
* is speculated.
--
2.24.0
Hi Boqun Feng,
sorry for the late reply.
On 16/12/2019 00:19, Boqun Feng wrote:
> Hi,
>
> This is the RFC patchset for vDSO support in ARM64 Hyper-V guest. To
> test it, Michael's ARM64 support patchset:
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> is needed.
>
> Similar as x86, Hyper-V on ARM64 use a TSC page for guests to read
> the virtualized hardware timer, this TSC page is read-only for the
> guests, so could be used for vDSO data page. And the vDSO (userspace)
> code could use the same code for timer reading as kernel, since
> they read the same TSC page.
>
I had a look to your patches and overall, I could not understand why we can't
use the arch_timer to do the same things you are doing with the one you
introduced in this series. What confuses me is that KVM works just fine with the
arch_timer which was designed with virtualization in mind. Why do we need
another one? Could you please explain?
> This patchset therefore extends ARM64's __vsdo_init() to allow multiple
> data pages and introduces the vclock_mode concept similar to x86 to
> allow different platforms (bare-metal, Hyper-V, etc.) to switch to
> different __arch_get_hw_counter() implementations. The rest of this
> patchset does the necessary setup for Hyper-V guests: mapping tsc page,
> enabling userspace to read cntvct, etc. to enable vDSO.
>
> This patchset consists of 6 patches:
>
> patch #1 allows hv_get_raw_timer() definition to be overridden for
> userspace and kernel to share the same hv_read_tsc_page() definition.
>
> patch #2 extends ARM64 to support multiple vDSO data pages.
>
> patch #3 introduces vclock_mode similiar to x86 to allow different
> __arch_get_hw_counter() implementations for different clocksources.
>
> patch #4 maps Hyper-V TSC page into vDSO data page.
>
> patch #5 allows userspace to read cntvct, so that userspace can
> efficiently read the clocksource.
>
> patch #6 enables the vDSO for ARM64 Hyper-V guest.
>
> The whole patchset is based on v5.5-rc1 plus Michael's ARM64 support
> patchset, and I've done a few tests with:
>
> https://github.com/nlynch-mentor/vdsotest
>
> Comments and suggestions are welcome!
>
> Regards,
> Boqun
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
--
Regards,
Vincenzo
Hi Vincenzo,
On Thu, Jan 23, 2020 at 10:48:07AM +0000, Vincenzo Frascino wrote:
> Hi Boqun Feng,
>
> sorry for the late reply.
>
That's OK, thanks for your review ;-)
> On 16/12/2019 00:19, Boqun Feng wrote:
> > Hi,
> >
> > This is the RFC patchset for vDSO support in ARM64 Hyper-V guest. To
> > test it, Michael's ARM64 support patchset:
> >
> > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> >
> > is needed.
> >
> > Similar as x86, Hyper-V on ARM64 use a TSC page for guests to read
> > the virtualized hardware timer, this TSC page is read-only for the
> > guests, so could be used for vDSO data page. And the vDSO (userspace)
> > code could use the same code for timer reading as kernel, since
> > they read the same TSC page.
> >
>
> I had a look to your patches and overall, I could not understand why we can't
> use the arch_timer to do the same things you are doing with the one you
> introduced in this series. What confuses me is that KVM works just fine with the
> arch_timer which was designed with virtualization in mind. Why do we need
> another one? Could you please explain?
>
Please note that the guest VM on Hyper-V for ARM64 doesn't use
arch_timer as the clocksource. See:
https://lore.kernel.org/linux-arm-kernel/[email protected]/
, ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
and other initialization work.
So just to be clear, your suggestion is
1) Hyper-V guest on ARM64 should use arch_timer as clocksource and vDSO
will just work.
or
2) Even though arch_timer is not used as the clocksource, we can still
use it for vDSO.
?
Regards,
Boqun
> > This patchset therefore extends ARM64's __vsdo_init() to allow multiple
> > data pages and introduces the vclock_mode concept similar to x86 to
> > allow different platforms (bare-metal, Hyper-V, etc.) to switch to
> > different __arch_get_hw_counter() implementations. The rest of this
> > patchset does the necessary setup for Hyper-V guests: mapping tsc page,
> > enabling userspace to read cntvct, etc. to enable vDSO.
> >
> > This patchset consists of 6 patches:
> >
> > patch #1 allows hv_get_raw_timer() definition to be overridden for
> > userspace and kernel to share the same hv_read_tsc_page() definition.
> >
> > patch #2 extends ARM64 to support multiple vDSO data pages.
> >
> > patch #3 introduces vclock_mode similiar to x86 to allow different
> > __arch_get_hw_counter() implementations for different clocksources.
> >
> > patch #4 maps Hyper-V TSC page into vDSO data page.
> >
> > patch #5 allows userspace to read cntvct, so that userspace can
> > efficiently read the clocksource.
> >
> > patch #6 enables the vDSO for ARM64 Hyper-V guest.
> >
> > The whole patchset is based on v5.5-rc1 plus Michael's ARM64 support
> > patchset, and I've done a few tests with:
> >
> > https://github.com/nlynch-mentor/vdsotest
> >
> > Comments and suggestions are welcome!
> >
> > Regards,
> > Boqun
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >
>
> --
> Regards,
> Vincenzo
Hi Boqun Feng,
On 24/01/2020 06:32, Boqun Feng wrote:
> Hi Vincenzo,
>
[...]
>>
>> I had a look to your patches and overall, I could not understand why we can't
>> use the arch_timer to do the same things you are doing with the one you
>> introduced in this series. What confuses me is that KVM works just fine with the
>> arch_timer which was designed with virtualization in mind. Why do we need
>> another one? Could you please explain?
>>
>
> Please note that the guest VM on Hyper-V for ARM64 doesn't use
> arch_timer as the clocksource. See:
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> , ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
> and other initialization work.
>
I had a look a look at it and my question stands, why do we need another timer
on arm64?
> So just to be clear, your suggestion is
>
> 1) Hyper-V guest on ARM64 should use arch_timer as clocksource and vDSO
> will just work.
>
> or
>
> 2) Even though arch_timer is not used as the clocksource, we can still
> use it for vDSO.
>
> ?
>
Option #1 would be the preferred solution, unless there is a good reason against.
> Regards,
> Boqun
>
--
Regards,
Vincenzo
On Fri, Jan 24, 2020 at 10:24:44AM +0000, Vincenzo Frascino wrote:
> Hi Boqun Feng,
>
> On 24/01/2020 06:32, Boqun Feng wrote:
> > Hi Vincenzo,
> >
>
> [...]
>
> >>
> >> I had a look to your patches and overall, I could not understand why we can't
> >> use the arch_timer to do the same things you are doing with the one you
> >> introduced in this series. What confuses me is that KVM works just fine with the
> >> arch_timer which was designed with virtualization in mind. Why do we need
> >> another one? Could you please explain?
> >>
> >
> > Please note that the guest VM on Hyper-V for ARM64 doesn't use
> > arch_timer as the clocksource. See:
> >
> > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> >
> > , ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
> > and other initialization work.
> >
>
> I had a look a look at it and my question stands, why do we need another timer
> on arm64?
>
Sorry for the late response. It's weekend and Chinese New Year, so I got
to spend some time making (and mostly eating) dumplings ;-)
After discussion with Michael, here is some explanation why we need
another timer:
The synthetic clocks that Hyper-V presents in a guest VM were originally
created for the x86 architecture. They provide a level of abstraction
that solves problems like continuity across live migrations where the
hardware clock (i.e., TSC in the case x86) frequency may be different
across the migration. When Hyper-V was brought to ARM64, this
abstraction was maintained to provide consistency across the x86 and
ARM64 architectures, and for both Windows and Linux guest VMs. The
core Linux code for the Hyper-V clocks (in
drivers/clocksource/hyperv_timer.c) is architecture neutral and works on
both x86 and ARM64. As you can see, this part is done in Michael's
patchset.
Arguably, Hyper-V for ARM64 should have optimized for consistency with
the ARM64 community rather with the existing x86 implementation and
existing guest code in Windows. But at this point, it is what it is,
and the Hyper-V clocks do solve problems like migration that aren’t
addressed in ARM64 until v8.4 of the architecture with the addition of
the counter hardware scaling feature. Hyper-V doesn’t currently map the
ARM arch timer interrupts into guest VMs, so we need to use the existing
Hyper-V clocks and the common code that already exists.
Does the above answer your question?
Regards,
Boqun
> > So just to be clear, your suggestion is
> >
> > 1) Hyper-V guest on ARM64 should use arch_timer as clocksource and vDSO
> > will just work.
> >
> > or
> >
> > 2) Even though arch_timer is not used as the clocksource, we can still
> > use it for vDSO.
> >
> > ?
> >
>
> Option #1 would be the preferred solution, unless there is a good reason against.
>
> > Regards,
> > Boqun
> >
>
> --
> Regards,
> Vincenzo
On 2020-01-28 05:58, Boqun Feng wrote:
> On Fri, Jan 24, 2020 at 10:24:44AM +0000, Vincenzo Frascino wrote:
>> Hi Boqun Feng,
>>
>> On 24/01/2020 06:32, Boqun Feng wrote:
>> > Hi Vincenzo,
>> >
>>
>> [...]
>>
>> >>
>> >> I had a look to your patches and overall, I could not understand why we can't
>> >> use the arch_timer to do the same things you are doing with the one you
>> >> introduced in this series. What confuses me is that KVM works just fine with the
>> >> arch_timer which was designed with virtualization in mind. Why do we need
>> >> another one? Could you please explain?
>> >>
>> >
>> > Please note that the guest VM on Hyper-V for ARM64 doesn't use
>> > arch_timer as the clocksource. See:
>> >
>> > https://lore.kernel.org/linux-arm-kernel/[email protected]/
>> >
>> > , ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
>> > and other initialization work.
>> >
>>
>> I had a look a look at it and my question stands, why do we need
>> another timer
>> on arm64?
>>
>
> Sorry for the late response. It's weekend and Chinese New Year, so I
> got
> to spend some time making (and mostly eating) dumplings ;-)
And you haven't been sharing! ;-)
> After discussion with Michael, here is some explanation why we need
> another timer:
>
> The synthetic clocks that Hyper-V presents in a guest VM were
> originally
> created for the x86 architecture. They provide a level of abstraction
> that solves problems like continuity across live migrations where the
> hardware clock (i.e., TSC in the case x86) frequency may be different
> across the migration. When Hyper-V was brought to ARM64, this
> abstraction was maintained to provide consistency across the x86 and
> ARM64 architectures, and for both Windows and Linux guest VMs. The
> core Linux code for the Hyper-V clocks (in
> drivers/clocksource/hyperv_timer.c) is architecture neutral and works
> on
> both x86 and ARM64. As you can see, this part is done in Michael's
> patchset.
>
> Arguably, Hyper-V for ARM64 should have optimized for consistency with
> the ARM64 community rather with the existing x86 implementation and
> existing guest code in Windows. But at this point, it is what it is,
> and the Hyper-V clocks do solve problems like migration that aren’t
> addressed in ARM64 until v8.4 of the architecture with the addition of
> the counter hardware scaling feature. Hyper-V doesn’t currently map the
> ARM arch timer interrupts into guest VMs, so we need to use the
> existing
> Hyper-V clocks and the common code that already exists.
The migration thing is a bit of a red herring. Do you really anticipate
VM migration across systems that have their timers running at different
frequencies *today*? And even if you did, there are ways to deal with it
with the arch timers (patches to that effect were posted on the list,
and
there was even a bit of an ARM spec for it).
I find it odd to try and make arm64 "just another x86", while the
architecture
gives you most of what you need already. I guess I'm tainted.
Thanks,
M.
--
Who you jivin' with that Cosmik Debris?