2015-12-28 21:53:14

by Joao Martins

[permalink] [raw]
Subject: [PATCH RFC 0/3] x86/xen: pvclock vdso support

Hey!

This series proposes support for pvclock vdso under Xen: Patch 1 adds
setting cpu 0 pvti page; Patch 2 registers the pvti page with Xen and sets
it accordingly in pvclock and Patch 3 adds a Kconfig option since Xen
doesn't yet support the PVCLOCK_TSC_STABLE_BIT flag. Though its support
will probably be discussed in another RFC I sent[0].

Any comments or suggestions are welcome!

Thanks,
Joao

[0] http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02835.html

Joao Martins (3):
x86/pvclock: add setter for pvclock_pvti_cpu0_va
x86/xen/time: setup vcpu 0 time info page
xen/Kconfig: add XEN_TIME_VSYSCALL option

arch/x86/include/asm/pvclock.h | 22 ++++++++------
arch/x86/kernel/kvmclock.c | 6 +---
arch/x86/kernel/pvclock.c | 11 +++++++
arch/x86/xen/Kconfig | 5 ++++
arch/x86/xen/time.c | 66 ++++++++++++++++++++++++++++++++++++++++++
include/xen/interface/vcpu.h | 28 ++++++++++++++++++
6 files changed, 124 insertions(+), 14 deletions(-)

--
2.1.4


2015-12-28 21:53:30

by Joao Martins

[permalink] [raw]
Subject: [PATCH RFC 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va

Right now there is only a pvclock_pvti_cpu0_va() which is defined on
kvmclock since:

commit dac16fba6fc5
("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")

The only user of this interface so far is kvm. This commit adds a setter
function for the pvti page and moves pvclock_pvti_cpu0_va to pvclock, which
is a more generic place to have it; and would allow other PV clocksources
to use it, such as Xen.

Signed-off-by: Joao Martins <[email protected]>
---
arch/x86/include/asm/pvclock.h | 22 +++++++++++++---------
arch/x86/kernel/kvmclock.c | 6 +-----
arch/x86/kernel/pvclock.c | 11 +++++++++++
3 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 66df22b..cfb1bb6 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -4,15 +4,6 @@
#include <linux/clocksource.h>
#include <asm/pvclock-abi.h>

-#ifdef CONFIG_PARAVIRT_CLOCK
-extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
-#else
-static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
-{
- return NULL;
-}
-#endif
-
/* some helper functions for xen and kvm pv clock sources */
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
@@ -101,4 +92,17 @@ struct pvclock_vsyscall_time_info {

#define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)

+#ifdef CONFIG_PARAVIRT_CLOCK
+void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti);
+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
+#else
+static inline void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *)
+{
+}
+static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+ return NULL;
+}
+#endif
+
#endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 72cef58..02a5d9e6 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -45,11 +45,6 @@ early_param("no-kvmclock", parse_no_kvmclock);
static struct pvclock_vsyscall_time_info *hv_clock;
static struct pvclock_wall_clock wall_clock;

-struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
-{
- return hv_clock;
-}
-
/*
* The wallclock is the time of day when we booted. Since then, some time may
* have elapsed since the hypervisor wrote the data. So we try to account for
@@ -329,6 +324,7 @@ int __init kvm_setup_vsyscall_timeinfo(void)
return 1;
}

+ pvclock_set_pvti_cpu0_va(hv_clock);
put_cpu();

kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 99bfc02..da6fbe2 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -25,6 +25,7 @@
#include <asm/pvclock.h>

static u8 valid_flags __read_mostly = 0;
+static struct pvclock_vsyscall_time_info *pvti_cpu0_va __read_mostly;

void pvclock_set_flags(u8 flags)
{
@@ -140,3 +141,13 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,

set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
}
+
+void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
+{
+ pvti_cpu0_va = pvti;
+}
+
+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+ return pvti_cpu0_va;
+}
--
2.1.4

2015-12-28 21:53:25

by Joao Martins

[permalink] [raw]
Subject: [PATCH RFC 2/3] x86/xen/time: setup vcpu 0 time info page

In order to support pvclock vdso on xen we need to setup the
time info page for vcpu 0 and register the page with Xen using
the VCPUOP_register_vcpu_time_memory_area hypercall. This
hypercall will also forcefully update the pvti which will set
some of the necessary flags for vdso. Afterwards we check if it
supports the PVCLOCK_TSC_STABLE_BIT flag which is mandatory for
having vdso/vsyscall support. And if so, it will set the cpu 0
pvti that will be later on used when mapping the vdso image.

The xen headers are also updated to include the new hypercall for
registering the secondary vcpu_time_info copy.

Signed-off-by: Joao Martins <[email protected]>
---
arch/x86/xen/time.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
include/xen/interface/vcpu.h | 28 +++++++++++++++++++
2 files changed, 94 insertions(+)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index a0a4e55..c17b1b2 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -17,6 +17,7 @@
#include <linux/slab.h>
#include <linux/pvclock_gtod.h>
#include <linux/timekeeper_internal.h>
+#include <linux/memblock.h>

#include <asm/pvclock.h>
#include <asm/xen/hypervisor.h>
@@ -403,6 +404,69 @@ static const struct pv_time_ops xen_time_ops __initconst = {
.sched_clock = xen_clocksource_read,
};

+#ifdef CONFIG_XEN_TIME_VSYSCALL
+static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
+
+static int xen_setup_vsyscall_time_info(int cpu)
+{
+ struct pvclock_vsyscall_time_info *ti;
+ struct vcpu_register_time_memory_area t;
+ struct pvclock_vcpu_time_info *pvti;
+ unsigned long mem;
+ int ret, size;
+ u8 flags;
+
+ ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
+ cpu, NULL);
+ if (ret == -ENOSYS) {
+ pr_debug("xen: vcpu_time_info placement not supported\n");
+ return -ENOTSUPP;
+ }
+
+ size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info));
+ mem = memblock_alloc(size, PAGE_SIZE);
+ if (!mem)
+ return -ENOMEM;
+
+ ti = __va(mem);
+ memset(ti, 0, size);
+
+ t.addr.v = &ti[cpu].pvti;
+
+ ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
+ cpu, &t);
+
+ if (ret) {
+ pr_debug("xen: cannot register vcpu_time_info err %d\n", ret);
+ memblock_free(mem, size);
+ return ret;
+ }
+
+ pvti = &ti[cpu].pvti;
+ flags = pvti->flags;
+
+ if (!(flags & PVCLOCK_TSC_STABLE_BIT)) {
+ pr_debug("xen: VCLOCK_PVCLOCK not supported\n");
+ memblock_free(mem, size);
+ return -ENOTSUPP;
+ }
+
+ xen_clock = ti;
+ pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+ pvclock_set_pvti_cpu0_va(xen_clock);
+
+ xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
+
+ return 0;
+}
+#else
+static int xen_setup_vsyscall_time_info(int cpu)
+{
+ return -1;
+}
+
+#endif /* CONFIG_XEN_TIME_VSYSCALL */
+
static void __init xen_time_init(void)
{
int cpu = smp_processor_id();
@@ -431,6 +495,8 @@ static void __init xen_time_init(void)
xen_setup_timer(cpu);
xen_setup_cpu_clockevents();

+ xen_setup_vsyscall_time_info(cpu);
+
if (xen_initial_domain())
pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier);
}
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index b05288c..902b59e 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -172,4 +172,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);

/* Send an NMI to the specified VCPU. @extra_arg == NULL. */
#define VCPUOP_send_nmi 11
+
+/*
+ * Register a memory location to get a secondary copy of the vcpu time
+ * parameters. The master copy still exists as part of the vcpu shared
+ * memory area, and this secondary copy is updated whenever the master copy
+ * is updated (and using the same versioning scheme for synchronisation).
+ *
+ * The intent is that this copy may be mapped (RO) into userspace so
+ * that usermode can compute system time using the time info and the
+ * tsc. Usermode will see an array of vcpu_time_info structures, one
+ * for each vcpu, and choose the right one by an existing mechanism
+ * which allows it to get the current vcpu number (such as via a
+ * segment limit). It can then apply the normal algorithm to compute
+ * system time from the tsc.
+ *
+ * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
+ */
+#define VCPUOP_register_vcpu_time_memory_area 13
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
+struct vcpu_register_time_memory_area {
+ union {
+ GUEST_HANDLE(vcpu_time_info_t) h;
+ struct pvclock_vcpu_time_info *v;
+ uint64_t p;
+ } addr;
+};
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
+
#endif /* __XEN_PUBLIC_VCPU_H__ */
--
2.1.4

2015-12-28 21:53:22

by Joao Martins

[permalink] [raw]
Subject: [PATCH RFC 3/3] xen/Kconfig: add XEN_TIME_VSYSCALL option

This option enables support for pvclock vsyscall/vdso
support on Xen. Default is off, since Xen doesn't
expose yet the PVCLOCK_TSC_STABLE_BIT flag.

Signed-off-by: Joao Martins <[email protected]>
---
arch/x86/xen/Kconfig | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index c7b15f3..636eaeb 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -37,6 +37,11 @@ config XEN_512GB
It is always possible to change the default via specifying the
boot parameter "xen_512gb_limit".

+config XEN_TIME_VSYSCALL
+ bool "Enable Xen support for pvclock vsyscall/vdso"
+ depends on XEN && PARAVIRT_CLOCK
+ def_bool n
+
config XEN_SAVE_RESTORE
bool
depends on XEN
--
2.1.4

2015-12-28 23:45:45

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va

On Mon, Dec 28, 2015 at 1:52 PM, Joao Martins <[email protected]> wrote:
> Right now there is only a pvclock_pvti_cpu0_va() which is defined on
> kvmclock since:
>
> commit dac16fba6fc5
> ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")
>
> The only user of this interface so far is kvm. This commit adds a setter
> function for the pvti page and moves pvclock_pvti_cpu0_va to pvclock, which
> is a more generic place to have it; and would allow other PV clocksources
> to use it, such as Xen.
>

> +
> +void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
> +{
> + pvti_cpu0_va = pvti;
> +}

IMO this either wants to be __init or wants a
WARN_ON(vclock_was_used(VCLOCK_PVCLOCK)). The latter hasn't landed in
-tip yet, but I think it'll land next week unless the merge window
opens early.

It may pay to actually separate out the kvm-clock clocksource and
rename it rather than partially duplicating it, assuming the result
wouldn't be messy.

Can you CC me on the rest of the series for new versions?

BTW, since this seems to require hypervisor changes to be useful, it
might make sense to rethink the interface a bit. Are you actually
planning to support per-cpu pvti for this in any useful way? If not,
I think that this would work a whole lot better and be considerably
less code if you had a single global pvti that lived in
hypervisor-allocated memory instead of an array that lives in guest
memory. I'd be happy to discuss next week in more detail (currently
on vacation).

--Andy

2015-12-29 12:51:19

by Joao Martins

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va

On 12/28/2015 11:45 PM, Andy Lutomirski wrote:
> On Mon, Dec 28, 2015 at 1:52 PM, Joao Martins <[email protected]> wrote:
>> Right now there is only a pvclock_pvti_cpu0_va() which is defined on
>> kvmclock since:
>>
>> commit dac16fba6fc5
>> ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")
>>
>> The only user of this interface so far is kvm. This commit adds a setter
>> function for the pvti page and moves pvclock_pvti_cpu0_va to pvclock, which
>> is a more generic place to have it; and would allow other PV clocksources
>> to use it, such as Xen.
>>
>
>> +
>> +void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
>> +{
>> + pvti_cpu0_va = pvti;
>> +}
>
> IMO this either wants to be __init or wants a
> WARN_ON(vclock_was_used(VCLOCK_PVCLOCK)). The latter hasn't landed in
> -tip yet, but I think it'll land next week unless the merge window
> opens early.
OK, I will add those two once it lands in -tip.

I had a silly mistake in this patch as I bindly ommited the parameter name to
keep checkpatch happy, but didn't compile check when built without PARAVIRT.
Apologies for that and will fix that also on the next version.

>
> It may pay to actually separate out the kvm-clock clocksource and
> rename it rather than partially duplicating it, assuming the result
> wouldn't be messy.
>
Not sure if I follow but I moved out pvclock_pvti_cpu0_va from kvm-clock or do
you mean to separate out kvm-clock in it's enterity, or something else within
kvm-clock is that is common to both (such as kvm_setup_vsyscall_timeinfo) ?

> Can you CC me on the rest of the series for new versions?
>
Sure! Thanks for the prompt reply.

> BTW, since this seems to require hypervisor changes to be useful, it
> might make sense to rethink the interface a bit. Are you actually
> planning to support per-cpu pvti for this in any useful way? If not,
> I think that this would work a whole lot better and be considerably
> less code if you had a single global pvti that lived in
> hypervisor-allocated memory instead of an array that lives in guest
> memory. I'd be happy to discuss next week in more detail (currently
> on vacation).
Initially I had this series using per-cpu pvti's based on Linux 4.4 but since
that was removed in favor of vdso using solely cpu0 pvti, then I ended up just
registering the cpu 0 page. I don't intend to add per-cpu pvti's since it would
only be used for this case: (unless the reviewers think it should be done)
meaning I would register pvti's for the other CPUs without having them used.
Having a global pvti as you suggest it would get a lot simpler for the guest,
but I guess this would only work assuming PVCLOCK_TSC_STABLE_BIT is there?
Looking forward to discuss it next week.

Joao

>
> --Andy
>

2015-12-29 13:03:31

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va

On Tue, Dec 29, 2015 at 4:50 AM, Joao Martins <[email protected]> wrote:
> On 12/28/2015 11:45 PM, Andy Lutomirski wrote:
>> On Mon, Dec 28, 2015 at 1:52 PM, Joao Martins <[email protected]> wrote:
>>> Right now there is only a pvclock_pvti_cpu0_va() which is defined on
>>> kvmclock since:
>>>
>>> commit dac16fba6fc5
>>> ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")
>>>
>>> The only user of this interface so far is kvm. This commit adds a setter
>>> function for the pvti page and moves pvclock_pvti_cpu0_va to pvclock, which
>>> is a more generic place to have it; and would allow other PV clocksources
>>> to use it, such as Xen.
>>>
>>
>>> +
>>> +void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
>>> +{
>>> + pvti_cpu0_va = pvti;
>>> +}
>>
>> IMO this either wants to be __init or wants a
>> WARN_ON(vclock_was_used(VCLOCK_PVCLOCK)). The latter hasn't landed in
>> -tip yet, but I think it'll land next week unless the merge window
>> opens early.
> OK, I will add those two once it lands in -tip.
>
> I had a silly mistake in this patch as I bindly ommited the parameter name to
> keep checkpatch happy, but didn't compile check when built without PARAVIRT.
> Apologies for that and will fix that also on the next version.
>
>>
>> It may pay to actually separate out the kvm-clock clocksource and
>> rename it rather than partially duplicating it, assuming the result
>> wouldn't be messy.
>>
> Not sure if I follow but I moved out pvclock_pvti_cpu0_va from kvm-clock or do
> you mean to separate out kvm-clock in it's enterity, or something else within
> kvm-clock is that is common to both (such as kvm_setup_vsyscall_timeinfo) ?

I meant literally using the same clocksource. I don't know whether
the Xen and KVM variants are similar enough for that to make sense.

>
>> Can you CC me on the rest of the series for new versions?
>>
> Sure! Thanks for the prompt reply.
>
>> BTW, since this seems to require hypervisor changes to be useful, it
>> might make sense to rethink the interface a bit. Are you actually
>> planning to support per-cpu pvti for this in any useful way? If not,
>> I think that this would work a whole lot better and be considerably
>> less code if you had a single global pvti that lived in
>> hypervisor-allocated memory instead of an array that lives in guest
>> memory. I'd be happy to discuss next week in more detail (currently
>> on vacation).
> Initially I had this series using per-cpu pvti's based on Linux 4.4 but since
> that was removed in favor of vdso using solely cpu0 pvti, then I ended up just
> registering the cpu 0 page. I don't intend to add per-cpu pvti's since it would
> only be used for this case: (unless the reviewers think it should be done)
> meaning I would register pvti's for the other CPUs without having them used.
> Having a global pvti as you suggest it would get a lot simpler for the guest,
> but I guess this would only work assuming PVCLOCK_TSC_STABLE_BIT is there?
> Looking forward to discuss it next week.

Sounds good.

--Andy