2009-09-02 14:08:40

by Glauber Costa

[permalink] [raw]
Subject: [PATCH v2 0/2] Automatically grab wallclock time updates from hypervisor

i folks,

In this proposed patch, I am introducing a worker fired by kvmclock that updates
guest wallclock periodically to reflect changes in the host's wallclock. With this
patch, a large pool of VMs will no longer have to run NTP in all of its guests.

The worker does that at a configurable interval, with a minimum granularity of 1
second. So, although not exactly cheap, the msr write needed to get an updated
wallclock value won't pose a heavy burden on the system.

It is also possible to disable it completely if this behaviour is undesired for
a specific scenario.

Changes from v1:
* disabled by default
* adjust clock in a loop, to prevent agaist host scheduling.

diffstat follows:

arch/x86/include/asm/kvm_para.h | 6 +++
arch/x86/kernel/kvmclock.c | 85 ++++++++++++++++++++++++++++++++++----
kernel/sysctl.c | 13 ++++++
3 files changed, 95 insertions(+), 9 deletions(-)


2009-09-02 14:09:00

by Glauber Costa

[permalink] [raw]
Subject: [PATCH v2 1/2] keep guest wallclock in sync with host clock

KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
However, the current mechanism will not propagate changes in wallclock value
upwards. This effectively means that in a large pool of VMs that need accurate timing,
all of them has to run NTP, instead of just the host doing it.

Since the host updates information in the shared memory area upon msr writes,
this patch introduces a worker that writes to that msr, and calls do_settimeofday
at fixed intervals, with second resolution. A interval of 0 determines that we
are not interested in this behaviour. A later patch will make this optional at
runtime

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/kvmclock.c | 70 ++++++++++++++++++++++++++++++++++++++-----
1 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index e5efcdc..555aab0 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
#define KVM_SCALE 22

static int kvmclock = 1;
+static unsigned int kvm_wall_update_interval = 0;

static int parse_no_kvmclock(char *arg)
{
@@ -39,24 +40,75 @@ early_param("no-kvmclock", parse_no_kvmclock);
static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
static struct pvclock_wall_clock wall_clock;

-/*
- * The wallclock is the time of day when we booted. Since then, some time may
- * have elapsed since the hypervisor wrote the data. So we try to account for
- * that with system time
- */
-static unsigned long kvm_get_wallclock(void)
+static void kvm_get_wall_ts(struct timespec *ts)
{
- struct pvclock_vcpu_time_info *vcpu_time;
- struct timespec ts;
int low, high;
+ struct pvclock_vcpu_time_info *vcpu_time;

low = (int)__pa_symbol(&wall_clock);
high = ((u64)__pa_symbol(&wall_clock) >> 32);
native_write_msr(MSR_KVM_WALL_CLOCK, low, high);

vcpu_time = &get_cpu_var(hv_clock);
- pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
+ pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
put_cpu_var(hv_clock);
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work);
+static DECLARE_DELAYED_WORK(kvm_sync_wall_work, kvm_sync_wall_clock);
+
+static void schedule_next_update(void)
+{
+ struct timespec next;
+
+ if ((kvm_wall_update_interval == 0) ||
+ (!kvm_para_available()) ||
+ (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
+ return;
+
+ next.tv_sec = kvm_wall_update_interval;
+ next.tv_nsec = 0;
+
+ schedule_delayed_work(&kvm_sync_wall_work, timespec_to_jiffies(&next));
+}
+
+static void kvm_sync_wall_clock(struct work_struct *work)
+{
+ struct timespec now, after;
+ u64 nsec_delta;
+
+ do {
+ kvm_get_wall_ts(&now);
+ do_settimeofday(&now);
+ kvm_get_wall_ts(&after);
+ nsec_delta = (u64)after.tv_sec * NSEC_PER_SEC + after.tv_nsec;
+ nsec_delta -= (u64)now.tv_sec * NSEC_PER_SEC + now.tv_nsec;
+ } while (nsec_delta > NSEC_PER_SEC / 8);
+
+ schedule_next_update();
+}
+
+static __init int init_updates(void)
+{
+ schedule_next_update();
+ return 0;
+}
+/*
+ * It has to be run after workqueues are initialized, since we call
+ * schedule_delayed_work. Other than that, we have no specific requirements
+ */
+late_initcall(init_updates);
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that with system time
+ */
+static unsigned long kvm_get_wallclock(void)
+{
+ struct timespec ts;
+
+ kvm_get_wall_ts(&ts);

return ts.tv_sec;
}
--
1.6.2.2

2009-09-02 14:08:44

by Glauber Costa

[permalink] [raw]
Subject: [PATCH v2 2/2] add sysctl for kvm wallclock sync

This patch introduces a new sysctl called kvm_sync_wallclock.

It controls the behaviour of the worker that updates guest wallclock time.
The worker will fire in periods specified by its value, if it is greater than zero,
and not fire at all otherwise.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/include/asm/kvm_para.h | 6 ++++++
arch/x86/kernel/kvmclock.c | 17 ++++++++++++++++-
kernel/sysctl.c | 13 +++++++++++++
3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index b8a3305..3a3f38f 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -47,8 +47,14 @@ struct kvm_mmu_op_release_pt {

#ifdef __KERNEL__
#include <asm/processor.h>
+#include <linux/sysctl.h>

extern void kvmclock_init(void);
+extern unsigned int kvm_wall_update_interval;
+extern int kvm_sync_wall_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer,
+ size_t *lenp, loff_t *ppos);
+


/* This instruction is vmcall. On non-VT architectures, it will generate a
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 555aab0..d90976b 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,7 +27,7 @@
#define KVM_SCALE 22

static int kvmclock = 1;
-static unsigned int kvm_wall_update_interval = 0;
+unsigned int kvm_wall_update_interval = 0;

static int parse_no_kvmclock(char *arg)
{
@@ -99,6 +99,21 @@ static __init int init_updates(void)
*/
late_initcall(init_updates);

+int kvm_sync_wall_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+ int ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
+
+ if (ret || !write)
+ return ret;
+
+ cancel_delayed_work_sync(&kvm_sync_wall_work);
+
+ schedule_next_update();
+ return 0;
+}
+
/*
* The wallclock is the time of day when we booted. Since then, some time may
* have elapsed since the hypervisor wrote the data. So we try to account for
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 98e0232..b787c81 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -51,6 +51,7 @@
#include <linux/ftrace.h>
#include <linux/slow-work.h>
#include <linux/perf_counter.h>
+#include <linux/kvm_para.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -989,6 +990,18 @@ static struct ctl_table kern_table[] = {
.proc_handler = &proc_dointvec,
},
#endif
+#ifdef CONFIG_KVM_CLOCK
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "kvm_sync_wallclock",
+ .data = &kvm_wall_update_interval,
+ .maxlen = sizeof(kvm_wall_update_interval),
+ .mode = 0644,
+ .proc_handler = &kvm_sync_wall_handler,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
+#endif

/*
* NOTE: do not add new entries to this table unless you have read
--
1.6.2.2

2009-09-08 18:42:14

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> However, the current mechanism will not propagate changes in wallclock value
> upwards. This effectively means that in a large pool of VMs that need accurate timing,
> all of them has to run NTP, instead of just the host doing it.
>
> Since the host updates information in the shared memory area upon msr writes,
> this patch introduces a worker that writes to that msr, and calls do_settimeofday
> at fixed intervals, with second resolution. A interval of 0 determines that we
> are not interested in this behaviour. A later patch will make this optional at
> runtime
>
> Signed-off-by: Glauber Costa <[email protected]>

As mentioned before, ntp already does this (and its not that heavy is
it?).

For example, if ntp running on the host, it avoids stepping the clock
backwards by slow adjustment, while the periodic frequency adjustment on
the guest bypasses that.

> ---
> arch/x86/kernel/kvmclock.c | 70 ++++++++++++++++++++++++++++++++++++++-----
> 1 files changed, 61 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index e5efcdc..555aab0 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -27,6 +27,7 @@
> #define KVM_SCALE 22
>
> static int kvmclock = 1;
> +static unsigned int kvm_wall_update_interval = 0;
>
> static int parse_no_kvmclock(char *arg)
> {
> @@ -39,24 +40,75 @@ early_param("no-kvmclock", parse_no_kvmclock);
> static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
> static struct pvclock_wall_clock wall_clock;
>
> -/*
> - * The wallclock is the time of day when we booted. Since then, some time may
> - * have elapsed since the hypervisor wrote the data. So we try to account for
> - * that with system time
> - */
> -static unsigned long kvm_get_wallclock(void)
> +static void kvm_get_wall_ts(struct timespec *ts)
> {
> - struct pvclock_vcpu_time_info *vcpu_time;
> - struct timespec ts;
> int low, high;
> + struct pvclock_vcpu_time_info *vcpu_time;
>
> low = (int)__pa_symbol(&wall_clock);
> high = ((u64)__pa_symbol(&wall_clock) >> 32);
> native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
>
> vcpu_time = &get_cpu_var(hv_clock);
> - pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
> + pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
> put_cpu_var(hv_clock);
> +}
> +
> +static void kvm_sync_wall_clock(struct work_struct *work);
> +static DECLARE_DELAYED_WORK(kvm_sync_wall_work, kvm_sync_wall_clock);
> +
> +static void schedule_next_update(void)
> +{
> + struct timespec next;
> +
> + if ((kvm_wall_update_interval == 0) ||
> + (!kvm_para_available()) ||
> + (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
> + return;
> +
> + next.tv_sec = kvm_wall_update_interval;
> + next.tv_nsec = 0;
> +
> + schedule_delayed_work(&kvm_sync_wall_work, timespec_to_jiffies(&next));
> +}
> +
> +static void kvm_sync_wall_clock(struct work_struct *work)
> +{
> + struct timespec now, after;
> + u64 nsec_delta;
> +
> + do {
> + kvm_get_wall_ts(&now);
> + do_settimeofday(&now);
> + kvm_get_wall_ts(&after);
> + nsec_delta = (u64)after.tv_sec * NSEC_PER_SEC + after.tv_nsec;
> + nsec_delta -= (u64)now.tv_sec * NSEC_PER_SEC + now.tv_nsec;
> + } while (nsec_delta > NSEC_PER_SEC / 8);
> +
> + schedule_next_update();
> +}
> +
> +static __init int init_updates(void)
> +{
> + schedule_next_update();
> + return 0;
> +}
> +/*
> + * It has to be run after workqueues are initialized, since we call
> + * schedule_delayed_work. Other than that, we have no specific requirements
> + */
> +late_initcall(init_updates);
> +
> +/*
> + * The wallclock is the time of day when we booted. Since then, some time may
> + * have elapsed since the hypervisor wrote the data. So we try to account for
> + * that with system time
> + */
> +static unsigned long kvm_get_wallclock(void)
> +{
> + struct timespec ts;
> +
> + kvm_get_wall_ts(&ts);
>
> return ts.tv_sec;
> }
> --
> 1.6.2.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-09-08 19:37:54

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote:
> On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> > KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> > However, the current mechanism will not propagate changes in wallclock value
> > upwards. This effectively means that in a large pool of VMs that need accurate timing,
> > all of them has to run NTP, instead of just the host doing it.
> >
> > Since the host updates information in the shared memory area upon msr writes,
> > this patch introduces a worker that writes to that msr, and calls do_settimeofday
> > at fixed intervals, with second resolution. A interval of 0 determines that we
> > are not interested in this behaviour. A later patch will make this optional at
> > runtime
> >
> > Signed-off-by: Glauber Costa <[email protected]>
>
> As mentioned before, ntp already does this (and its not that heavy is
> it?).
>
> For example, if ntp running on the host, it avoids stepping the clock
> backwards by slow adjustment, while the periodic frequency adjustment on
> the guest bypasses that.

Simple question: How do I run ntp in guests without network?

2009-09-08 20:01:29

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

On Tue, Sep 08, 2009 at 04:37:52PM -0300, Glauber Costa wrote:
> On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote:
> > On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> > > KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> > > However, the current mechanism will not propagate changes in wallclock value
> > > upwards. This effectively means that in a large pool of VMs that need accurate timing,
> > > all of them has to run NTP, instead of just the host doing it.
> > >
> > > Since the host updates information in the shared memory area upon msr writes,
> > > this patch introduces a worker that writes to that msr, and calls do_settimeofday
> > > at fixed intervals, with second resolution. A interval of 0 determines that we
> > > are not interested in this behaviour. A later patch will make this optional at
> > > runtime
> > >
> > > Signed-off-by: Glauber Costa <[email protected]>
> >
> > As mentioned before, ntp already does this (and its not that heavy is
> > it?).
> >
> > For example, if ntp running on the host, it avoids stepping the clock
> > backwards by slow adjustment, while the periodic frequency adjustment on
> > the guest bypasses that.
>
> Simple question: How do I run ntp in guests without network?

You don't.

2009-09-08 20:12:46

by Anthony Liguori

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

Marcelo Tosatti wrote:
>>
>> Simple question: How do I run ntp in guests without network?
>>
>
> You don't.
>
Why bother doing this in the kernel? Isn't this the sort of thing
vmchannel is supposed to handle. openvm-tools does this.

/me ducks

Regards,

Anthony Liguori

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2009-09-08 20:15:15

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

On Tue, Sep 08, 2009 at 05:00:04PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 08, 2009 at 04:37:52PM -0300, Glauber Costa wrote:
> > On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> > > > KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> > > > However, the current mechanism will not propagate changes in wallclock value
> > > > upwards. This effectively means that in a large pool of VMs that need accurate timing,
> > > > all of them has to run NTP, instead of just the host doing it.
> > > >
> > > > Since the host updates information in the shared memory area upon msr writes,
> > > > this patch introduces a worker that writes to that msr, and calls do_settimeofday
> > > > at fixed intervals, with second resolution. A interval of 0 determines that we
> > > > are not interested in this behaviour. A later patch will make this optional at
> > > > runtime
> > > >
> > > > Signed-off-by: Glauber Costa <[email protected]>
> > >
> > > As mentioned before, ntp already does this (and its not that heavy is
> > > it?).
> > >
> > > For example, if ntp running on the host, it avoids stepping the clock
> > > backwards by slow adjustment, while the periodic frequency adjustment on
> > > the guest bypasses that.
> >
> > Simple question: How do I run ntp in guests without network?
>
> You don't.
For those guests, the mechanism I am proposing comes handy.

Furthermore, it is not only optional, but disabled by default. And then even if you
have a network, but a genuine reason not to use ntp in your VMs, you can use it too.