LinuxLists.cc - KVM paravirt clocksource - Take 3 out of <put your number here>

2007-11-06 21:18:52

Subject: KVM paravirt clocksource - Take 3 out of <put your number here>

This is a new version of kvm paravirt clock implementation

This time, the clockevents part was completely wiped out.
Not that I'm dropping it: As said in my last message, I'm starting
to take the path of having a specialized irq chip, as for avi-san's
suggestion.

However, last iteration made clocksources and clockevents a lot more
independent, and so, turning it into a clocksource-only implementation
was just a matter of deleting code - that can be later added almost as-is.
It also favours people willing to use the clocksource, but going towards
userspace HPET emulation, or things like this.

The goal is to have this part included independently of the other bits.

>From last release:

* no more clockevents (for a while, hang on!)
* per-vcpu hv_clock area, which led to...
* no more purely tsc timing.

If you have a new concern with this version, or I failed to address a previous concern of yours, just voice it!

2007-11-06 21:19:58

by Glauber Costa

[permalink] [raw]

Subject: kvmclock implementation, the guest part.

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/i386/Kconfig | 10 +++
arch/x86/kernel/Makefile_32 | 1 +
arch/x86/kernel/kvmclock.c | 164 +++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/setup_32.c | 5 ++
4 files changed, 180 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/kernel/kvmclock.c

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index b4437ce..a3b45f1 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -257,6 +257,16 @@ config VMI
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.

+config KVM_CLOCK
+ bool "KVM paravirtualized clock"
+ select PARAVIRT
+ help
+ Turning on this option will allow you to run a paravirtualized clock
+ when running over the KVM hypervisor. Instead of relying on a PIT
+ (or probably other) emulation by the underlying device model, the host
+ provides the guest with timing infrastructure, as time of day, and
+ timer expiration.
+
source "arch/x86/lguest/Kconfig"

endif
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index b9d6798..df76d8c 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -43,6 +43,7 @@ obj-$(CONFIG_K8_NB) += k8.o
obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o

obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_CLOCK) += kvmclock.o
obj-$(CONFIG_PARAVIRT) += paravirt_32.o
obj-y += pcspeaker.o

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
new file mode 100644
index 0000000..8778d61
--- /dev/null
+++ b/arch/x86/kernel/kvmclock.c
@@ -0,0 +1,164 @@
+/* KVM paravirtual clock driver. A clocksource implementation
+ Copyright (C) 2007 Glauber de Oliveira Costa, Red Hat Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+*/
+
+#include <linux/clocksource.h>
+#include <linux/clockchips.h>
+#include <linux/interrupt.h>
+#include <linux/kvm_para.h>
+#include <linux/ktime.h>
+#include <asm/arch_hooks.h>
+#include <asm/i8253.h>
+
+#include <mach_ipi.h>
+#include <irq_vectors.h>
+
+#define KVM_SCALE 22
+
+static int kvmclock = 1;
+
+static int parse_no_kvmclock(char *arg)
+{
+ kvmclock = 0;
+ return 0;
+}
+early_param("no-kvmclock", parse_no_kvmclock);
+
+/* The hypervisor will put information about time periodically here */
+union kvm_hv_clock hv_clock[NR_CPUS] __attribute__((__aligned__(PAGE_SIZE)));
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that. Even if the tsc is not accurate, it gives us a more accurate timing
+ * than not adjusting at all
+ */
+unsigned long kvm_get_wallclock(void)
+{
+ u64 wc_sec, delta, last_tsc;
+ struct timespec ts;
+ int version, nsec, cpu = smp_processor_id();
+
+ do {
+ version = hv_clock[cpu].version;
+ rmb();
+ last_tsc = hv_clock[cpu].last_tsc;
+ rmb();
+ wc_sec = hv_clock[cpu].wc_sec;
+ rmb();
+ } while ((hv_clock[cpu].version != version) && !(version & 1));
+
+ rdtscll(delta);
+ delta = delta - last_tsc;
+ delta = (delta * hv_clock[cpu].tsc_mult) >> KVM_SCALE;
+ nsec = do_div(delta, NSEC_PER_SEC);
+ set_normalized_timespec(&ts, wc_sec + delta, nsec);
+
+ /*
+ * Of all mechanisms of time adjustment I've tested, this one
+ * was the champion!
+ */
+ return ts.tv_sec + 1;
+}
+
+int kvm_set_wallclock(unsigned long now)
+{
+ return 0;
+}
+
+/*
+ * This is our read_clock function. The host puts an tsc timestamp each time
+ * it updates a new time, and then we can use it to derive a slightly more
+ * precise notion of elapsed time, converted to nanoseconds.
+ */
+static cycle_t kvm_clock_read(void)
+{
+
+ u64 delta, last_tsc, now;
+ u32 version;
+ int cpu = smp_processor_id();
+
+ do {
+ version = hv_clock[cpu].version;
+ rmb();
+ last_tsc = hv_clock[cpu].last_tsc;
+ rmb();
+ now = hv_clock[cpu].now_ns;
+ rmb();
+ } while ((hv_clock[cpu].version != version) && !(version & 1));
+
+ delta = native_read_tsc() - last_tsc;
+ delta = (delta * hv_clock[cpu].tsc_mult) >> KVM_SCALE;
+
+ return now + delta;
+}
+
+static struct clocksource kvm_clock = {
+ .name = "kvm-clock",
+ .read = kvm_clock_read,
+ .rating = 400,
+ .mask = CLOCKSOURCE_MASK(64),
+ .mult = 1 << KVM_SCALE,
+ .shift = KVM_SCALE,
+ .flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+unsigned long long kvm_sched_clock(void)
+{
+ return kvm_clock_read();
+}
+
+static int kvm_register_clock(unsigned int cpu)
+{
+ unsigned long kvm_clock_info = __pa((unsigned long)&hv_clock[cpu]);
+ return kvm_hypercall2(KVM_HCALL_REGISTER_CLOCK, kvm_clock_info, cpu);
+}
+
+int kvm_cpu_up(unsigned int cpu)
+{
+ /*
+ * Now that the first cpu already had this clocksource initialized,
+ * we shouldn't fail.
+ */
+ WARN_ON(kvm_register_clock(cpu));
+ return native_cpu_up(cpu);
+}
+
+void __init kvmclock_init(void)
+{
+ int cpu = smp_processor_id();
+ int r;
+
+ /*
+ * If we can't use the paravirt clock, just go with
+ * the usual timekeeping
+ */
+ if (!kvm_para_available())
+ return;
+
+ r = kvm_register_clock(cpu);
+ if (r)
+ return;
+
+ if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
+ pv_time_ops.get_wallclock = kvm_get_wallclock;
+ pv_time_ops.set_wallclock = kvm_set_wallclock;
+ pv_time_ops.sched_clock = kvm_sched_clock;
+ smp_ops.cpu_up = kvm_cpu_up;
+ clocksource_register(&kvm_clock);
+ }
+}
diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index cc0e914..a6cfd47 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -44,6 +44,7 @@
#include <linux/crash_dump.h>
#include <linux/dmi.h>
#include <linux/pfn.h>
+#include <linux/kvm_para.h>

#include <video/edid.h>

@@ -617,6 +618,10 @@ void __init setup_arch(char **cmdline_p)

max_low_pfn = setup_memory();

+#ifdef CONFIG_KVM_CLOCK
+ kvmclock_init();
+#endif
+
#ifdef CONFIG_VMI
/*
* Must be after max_low_pfn is determined, and before kernel
--
1.5.0.6

2007-11-06 21:21:46

by Glauber Costa

[permalink] [raw]

Subject: include files for kvmclock

This patch introduces the include files for kvm clock.
They'll be needed for both guest and host part.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/kvm_para.h | 23 +++++++++++++++++++++++
include/linux/kvm.h | 1 +
include/linux/kvm_para.h | 20 ++++++++++++++++++++
3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..af9fb75 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -10,15 +10,38 @@
* paravirtualization, the appropriate feature bit should be checked.
*/
#define KVM_CPUID_FEATURES 0x40000001
+#define KVM_FEATURE_CLOCKEVENTS 0
+#define KVM_FEATURE_CLOCKSOURCE 1
+

#ifdef __KERNEL__
#include <asm/processor.h>
+extern void kvmclock_init(void);
+
+union kvm_hv_clock {
+ struct {
+ u64 tsc_mult;
+ u64 now_ns;
+ /* That's the wall clock, not the water closet */
+ u64 wc_sec;
+ u64 wc_nsec;
+ u64 last_tsc;
+ /* At first, we could use the tsc value as a marker, but Jeremy
+ * well noted that it will cause us locking problems in 32-bit
+ * sys, so we have a special version field */
+ u32 version;
+ };
+ char page_align[PAGE_SIZE];
+};
+

/* This instruction is vmcall. On non-VT architectures, it will generate a
* trap that we will then rewrite to the appropriate instruction.
*/
#define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"

+#define KVM_HCALL_REGISTER_CLOCK 1
+
/* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
* instruction. The hypervisor may replace it with something else but only the
* instructions are guaranteed to be supported.
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..7ac8786 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -359,6 +359,7 @@ struct kvm_signal_mask {
#define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
#define KVM_CAP_USER_MEMORY 3
#define KVM_CAP_SET_TSS_ADDR 4
+#define KVM_CAP_CLK 5

/*
* ioctls for VM fds
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e4db25f..567a192 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -11,8 +11,28 @@

/* Return values for hypercalls */
#define KVM_ENOSYS 1000
+#define KVM_ENODEV 1019
+#define KVM_EINVAL 1022

#ifdef __KERNEL__
+#define KVM_HCALL_REGISTER_CLOCK 1
+
+union kvm_hv_clock {
+ struct {
+ u64 tsc_mult;
+ u64 now_ns;
+ /* That's the wall clock, not the water closet */
+ u64 wc_sec;
+ u64 wc_nsec;
+ u64 last_tsc;
+ /* At first, we could use the tsc value as a marker, but Jeremy
+ * well noted that it will cause us locking problems in 32-bit
+ * sys, so we have a special version field */
+ u32 version;
+ };
+ char page_align[PAGE_SIZE];
+};
+
/*
* hypercalls use architecture specific
*/
--
1.5.0.6

2007-11-06 21:23:17

by Glauber Costa

[permalink] [raw]

Subject: kvmclock - the host part.

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
drivers/kvm/irq.c | 1 +
drivers/kvm/kvm_main.c | 2 +
drivers/kvm/svm.c | 1 +
drivers/kvm/vmx.c | 1 +
drivers/kvm/x86.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
drivers/kvm/x86.h | 13 ++++++++++
6 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/irq.c b/drivers/kvm/irq.c
index 22bfeee..0344879 100644
--- a/drivers/kvm/irq.c
+++ b/drivers/kvm/irq.c
@@ -92,6 +92,7 @@ void kvm_vcpu_kick_request(struct kvm_vcpu *vcpu, int request)

void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
{
+ vcpu->time_needs_update = 1;
kvm_inject_apic_timer_irqs(vcpu);
/* TODO: PIT, RTC etc. */
}
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 0b8edca..5834573 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -20,6 +20,7 @@
#include "x86_emulate.h"
#include "irq.h"

+#include <linux/clocksource.h>
#include <linux/kvm.h>
#include <linux/module.h>
#include <linux/errno.h>
@@ -1242,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
case KVM_CAP_USER_MEMORY:
case KVM_CAP_SET_TSS_ADDR:
+ case KVM_CAP_CLK:
r = 1;
break;
default:
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 95a3489..cb8c19d 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -617,6 +617,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)

__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
kvm_vcpu_uninit(vcpu);
+ release_clock(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
}

diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index da3a339..b5edeed 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -2501,6 +2501,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
kfree(vmx->host_msrs);
kfree(vmx->guest_msrs);
kvm_vcpu_uninit(vcpu);
+ release_clock(vcpu);
kmem_cache_free(kvm_vcpu_cache, vmx);
}

diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..d476488 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -19,6 +19,7 @@
#include "segment_descriptor.h"
#include "irq.h"

+#include <linux/clocksource.h>
#include <linux/kvm.h>
#include <linux/fs.h>
#include <linux/vmalloc.h>
@@ -1628,6 +1629,31 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_emulate_halt);

+static void kvm_write_guest_time(struct kvm_vcpu *vcpu)
+{
+ struct timespec ts;
+ void *clock_addr;
+
+
+ if (!vcpu->clock_page)
+ return;
+
+ /* Updates version to the next odd number, indicating we're writing */
+ vcpu->hv_clock.version++;
+ /* Updating the tsc count is the first thing we do */
+ kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, &vcpu->hv_clock.last_tsc);
+ ktime_get_ts(&ts);
+ vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
+ vcpu->hv_clock.wc_sec = get_seconds();
+ vcpu->hv_clock.version++;
+
+ clock_addr = vcpu->clock_addr;
+ memcpy(clock_addr, &vcpu->hv_clock, sizeof(vcpu->hv_clock));
+ mark_page_dirty(vcpu->kvm, vcpu->clock_gfn);
+
+ vcpu->time_needs_update = 0;
+}
+
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
{
unsigned long nr, a0, a1, a2, a3, ret;
@@ -1648,7 +1674,33 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
a3 &= 0xFFFFFFFF;
}

+ ret = 0;
switch (nr) {
+ case KVM_HCALL_REGISTER_CLOCK: {
+ struct kvm_vcpu *dst_vcpu;
+
+ if (!((a1 < KVM_MAX_VCPUS) && (vcpu->kvm->vcpus[a1]))) {
+ ret = -KVM_EINVAL;
+ break;
+ }
+
+ dst_vcpu = vcpu->kvm->vcpus[a1];
+ dst_vcpu->clock_page = gfn_to_page(vcpu->kvm, a0 >> PAGE_SHIFT);
+
+ if (!dst_vcpu->clock_page) {
+ ret = -KVM_EINVAL;
+ break;
+ }
+ dst_vcpu->clock_gfn = a0 >> PAGE_SHIFT;
+
+ dst_vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22);
+ dst_vcpu->clock_addr = kmap(dst_vcpu->clock_page);
+
+ dst_vcpu->time_needs_update = 1;
+
+ break;
+ }
+
default:
ret = -KVM_ENOSYS;
break;
@@ -1816,6 +1868,12 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu,
vcpu->irq_summary == 0);
}

+void kvm_update_guest_time(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->time_needs_update)
+ kvm_write_guest_time(vcpu);
+}
+
static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
int r;
@@ -1876,6 +1934,7 @@ again:
vcpu->guest_mode = 1;
kvm_guest_enter();

+ kvm_update_guest_time(vcpu);
kvm_x86_ops->run(vcpu, kvm_run);

vcpu->guest_mode = 0;
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 663b822..7a951cc 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -15,6 +15,7 @@

#include <linux/types.h>
#include <linux/mm.h>
+#include <linux/highmem.h>

#include <linux/kvm.h>
#include <linux/kvm_para.h>
@@ -83,8 +84,20 @@ struct kvm_vcpu {
/* emulate context */

struct x86_emulate_ctxt emulate_ctxt;
+
+ union kvm_hv_clock hv_clock;
+ int time_needs_update;
+ struct page *clock_page;
+ void *clock_addr; /* address in host space */
+ u64 clock_gfn; /* guest frame number, physical addr */
+
};

+static inline void release_clock(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->clock_page)
+ kunmap(vcpu->clock_page);
+}
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code);

static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
--
1.5.0.6

2007-11-06 21:35:50

by Glauber Costa

[permalink] [raw]

Subject: Re: [kvm-devel] include files for kvmclock

On 11/6/07, Glauber de Oliveira Costa <[email protected]> wrote:
> This patch introduces the include files for kvm clock.
> They'll be needed for both guest and host part.

And of course, this was my test files by mistake ;-)
Oh god... ;-)

Patches aren't numbered but this one should go first. And please just ignore,
the replica of the hv_clock union definition in the patch bellow. It
should all go in asm/kvm_para.h

The other two patches are fine, and can be applied in any order, so
it's not worth resending. I'll grab your comments first.

> Signed-off-by: Glauber de Oliveira Costa <[email protected]>
> ---
> include/asm-x86/kvm_para.h | 23 +++++++++++++++++++++++
> include/linux/kvm.h | 1 +
> include/linux/kvm_para.h | 20 ++++++++++++++++++++
> 3 files changed, 44 insertions(+), 0 deletions(-)
>
> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
> index c6f3fd8..af9fb75 100644
> --- a/include/asm-x86/kvm_para.h
> +++ b/include/asm-x86/kvm_para.h
> @@ -10,15 +10,38 @@
> * paravirtualization, the appropriate feature bit should be checked.
> */
> #define KVM_CPUID_FEATURES 0x40000001
> +#define KVM_FEATURE_CLOCKEVENTS 0
> +#define KVM_FEATURE_CLOCKSOURCE 1
> +
>
> #ifdef __KERNEL__
> #include <asm/processor.h>
> +extern void kvmclock_init(void);
> +
> +union kvm_hv_clock {
> + struct {
> + u64 tsc_mult;
> + u64 now_ns;
> + /* That's the wall clock, not the water closet */
> + u64 wc_sec;
> + u64 wc_nsec;
> + u64 last_tsc;
> + /* At first, we could use the tsc value as a marker, but Jeremy
> + * well noted that it will cause us locking problems in 32-bit
> + * sys, so we have a special version field */
> + u32 version;
> + };
> + char page_align[PAGE_SIZE];
> +};
> +
>
> /* This instruction is vmcall. On non-VT architectures, it will generate a
> * trap that we will then rewrite to the appropriate instruction.
> */
> #define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
>
> +#define KVM_HCALL_REGISTER_CLOCK 1
> +
> /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
> * instruction. The hypervisor may replace it with something else but only the
> * instructions are guaranteed to be supported.
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 71d33d6..7ac8786 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -359,6 +359,7 @@ struct kvm_signal_mask {
> #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
> #define KVM_CAP_USER_MEMORY 3
> #define KVM_CAP_SET_TSS_ADDR 4
> +#define KVM_CAP_CLK 5
>
> /*
> * ioctls for VM fds
> diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
> index e4db25f..567a192 100644
> --- a/include/linux/kvm_para.h
> +++ b/include/linux/kvm_para.h
> @@ -11,8 +11,28 @@
>
> /* Return values for hypercalls */
> #define KVM_ENOSYS 1000
> +#define KVM_ENODEV 1019
> +#define KVM_EINVAL 1022
>
> #ifdef __KERNEL__
> +#define KVM_HCALL_REGISTER_CLOCK 1
> +
> +union kvm_hv_clock {
> + struct {
> + u64 tsc_mult;
> + u64 now_ns;
> + /* That's the wall clock, not the water closet */
> + u64 wc_sec;
> + u64 wc_nsec;
> + u64 last_tsc;
> + /* At first, we could use the tsc value as a marker, but Jeremy
> + * well noted that it will cause us locking problems in 32-bit
> + * sys, so we have a special version field */
> + u32 version;
> + };
> + char page_align[PAGE_SIZE];
> +};
> +
> /*
> * hypercalls use architecture specific
> */
> --
> 1.5.0.6
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> kvm-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>

--
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

2007-11-06 22:50:23

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: include files for kvmclock

Glauber de Oliveira Costa wrote:
> This patch introduces the include files for kvm clock.
> They'll be needed for both guest and host part.
>
> Signed-off-by: Glauber de Oliveira Costa <[email protected]>
> ---
> include/asm-x86/kvm_para.h | 23 +++++++++++++++++++++++
> include/linux/kvm.h | 1 +
> include/linux/kvm_para.h | 20 ++++++++++++++++++++
> 3 files changed, 44 insertions(+), 0 deletions(-)
>
> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
> index c6f3fd8..af9fb75 100644
> --- a/include/asm-x86/kvm_para.h
> +++ b/include/asm-x86/kvm_para.h
> @@ -10,15 +10,38 @@
> * paravirtualization, the appropriate feature bit should be checked.
> */
> #define KVM_CPUID_FEATURES 0x40000001
> +#define KVM_FEATURE_CLOCKEVENTS 0
> +#define KVM_FEATURE_CLOCKSOURCE 1
> +
>
> #ifdef __KERNEL__
> #include <asm/processor.h>
> +extern void kvmclock_init(void);
> +
> +union kvm_hv_clock {
>

Why two copies of this structure?

> + struct {
> + u64 tsc_mult;
> + u64 now_ns;
> + /* That's the wall clock, not the water closet */
> + u64 wc_sec;
> + u64 wc_nsec;
> + u64 last_tsc;
> + /* At first, we could use the tsc value as a marker, but Jeremy
> + * well noted that it will cause us locking problems in 32-bit
> + * sys, so we have a special version field */
> + u32 version;
> + };
> + char page_align[PAGE_SIZE];
> +};
> +
>

[...]

> +
> +union kvm_hv_clock {
> + struct {
> + u64 tsc_mult;
> + u64 now_ns;
> + /* That's the wall clock, not the water closet */
> + u64 wc_sec;
> + u64 wc_nsec;
> + u64 last_tsc;
> + /* At first, we could use the tsc value as a marker, but Jeremy
> + * well noted that it will cause us locking problems in 32-bit
> + * sys, so we have a special version field */
> + u32 version;
> + };
> + char page_align[PAGE_SIZE];
> +};
> +
> /*
> * hypercalls use architecture specific
> */
>

2007-11-06 22:58:18

by Glauber Costa

[permalink] [raw]

Subject: Re: include files for kvmclock

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jeremy Fitzhardinge escreveu:
> Glauber de Oliveira Costa wrote:
>> This patch introduces the include files for kvm clock.
>> They'll be needed for both guest and host part.
>>
>> Signed-off-by: Glauber de Oliveira Costa <[email protected]>
>> ---
>> include/asm-x86/kvm_para.h | 23 +++++++++++++++++++++++
>> include/linux/kvm.h | 1 +
>> include/linux/kvm_para.h | 20 ++++++++++++++++++++
>> 3 files changed, 44 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
>> index c6f3fd8..af9fb75 100644
>> --- a/include/asm-x86/kvm_para.h
>> +++ b/include/asm-x86/kvm_para.h
>> @@ -10,15 +10,38 @@
>> * paravirtualization, the appropriate feature bit should be checked.
>> */
>> #define KVM_CPUID_FEATURES 0x40000001
>> +#define KVM_FEATURE_CLOCKEVENTS 0
>> +#define KVM_FEATURE_CLOCKSOURCE 1
>> +
>>
>> #ifdef __KERNEL__
>> #include <asm/processor.h>
>> +extern void kvmclock_init(void);
>> +
>> +union kvm_hv_clock {
>>
>
> Why two copies of this structure?
>
It's called silly mistake. ;-)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMPGvjYI8LaFUWXMRAgt2AJ9NKgq2LCueUidH56ZgUYA+5wBhGwCfdqQB
otFP1/SFowaANQ8FojEtJUE=
=8Xqp
-----END PGP SIGNATURE-----

2007-11-07 05:50:43

by Avi Kivity

[permalink] [raw]

Subject: Re: kvmclock - the host part.

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
>
> Signed-off-by: Glauber de Oliveira Costa <[email protected]>
> ---
> drivers/kvm/irq.c | 1 +
> drivers/kvm/kvm_main.c | 2 +
> drivers/kvm/svm.c | 1 +
> drivers/kvm/vmx.c | 1 +
> drivers/kvm/x86.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
> drivers/kvm/x86.h | 13 ++++++++++
> 6 files changed, 77 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/kvm/irq.c b/drivers/kvm/irq.c
> index 22bfeee..0344879 100644
> --- a/drivers/kvm/irq.c
> +++ b/drivers/kvm/irq.c
> @@ -92,6 +92,7 @@ void kvm_vcpu_kick_request(struct kvm_vcpu *vcpu, int request)
>
> void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
> {
> + vcpu->time_needs_update = 1;
>

Why here and not in __vcpu_run()? It isn't timer irq related.

> @@ -1242,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
> case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
> case KVM_CAP_USER_MEMORY:
> case KVM_CAP_SET_TSS_ADDR:
> + case KVM_CAP_CLK:
>

It's just a clock source now, right? so _CLOCK_SOURCE.

>
> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu)
> +{
> + struct timespec ts;
> + void *clock_addr;
> +
> +
> + if (!vcpu->clock_page)
> + return;
> +
> + /* Updates version to the next odd number, indicating we're writing */
> + vcpu->hv_clock.version++;
>

No one can actually see this as you're updating a private structure.
You need to copy it to guestspace.

> + /* Updating the tsc count is the first thing we do */
> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, &vcpu->hv_clock.last_tsc);
> + ktime_get_ts(&ts);
> + vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
> + vcpu->hv_clock.wc_sec = get_seconds();
> + vcpu->hv_clock.version++;
> +
> + clock_addr = vcpu->clock_addr;
> + memcpy(clock_addr, &vcpu->hv_clock, sizeof(vcpu->hv_clock));
> + mark_page_dirty(vcpu->kvm, vcpu->clock_gfn);
>

Just use kvm_write_guest().

> +
> + vcpu->time_needs_update = 0;
> +}
> +
> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> {
> unsigned long nr, a0, a1, a2, a3, ret;
> @@ -1648,7 +1674,33 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> a3 &= 0xFFFFFFFF;
> }
>
> + ret = 0;
> switch (nr) {
> + case KVM_HCALL_REGISTER_CLOCK: {
> + struct kvm_vcpu *dst_vcpu;
> +
> + if (!((a1 < KVM_MAX_VCPUS) && (vcpu->kvm->vcpus[a1]))) {
> + ret = -KVM_EINVAL;
> + break;
> + }
> +
> + dst_vcpu = vcpu->kvm->vcpus[a1];
>

What if !dst_vcpu? What about locking?

Suggest simply using vcpu. Every guest cpu can register its own
clocksource.

> + dst_vcpu->clock_page = gfn_to_page(vcpu->kvm, a0 >> PAGE_SHIFT);
>

Shift right? Why?

> +
> + if (!dst_vcpu->clock_page) {
>

IIRC gfn_to_page() never returns NULL, need a different check.

> + ret = -KVM_EINVAL;
> + break;
> + }
> + dst_vcpu->clock_gfn = a0 >> PAGE_SHIFT;
> +
> + dst_vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22);
> + dst_vcpu->clock_addr = kmap(dst_vcpu->clock_page);
>

kmap() is bad since the page can move due to swapping.
kvm_write_guest() is your friend.

> +static inline void release_clock(struct kvm_vcpu *vcpu)
> +{
> + if (vcpu->clock_page)
> + kunmap(vcpu->clock_page);
> +}
>

While it's a static inline, please prefix with kvm_ in case one day it
isn't.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-11-07 05:55:53

by Avi Kivity

[permalink] [raw]

Subject: Re: [kvm-devel] include files for kvmclock

Glauber de Oliveira Costa wrote:
>> +union kvm_hv_clock {
>> + struct {
>> + u64 tsc_mult;
>> + u64 now_ns;
>> + /* That's the wall clock, not the water closet */
>> + u64 wc_sec;
>> + u64 wc_nsec;
>>

Do we really need 128-bit time? you must be planning to live forever.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-11-07 05:59:15

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [kvm-devel] include files for kvmclock

Avi Kivity wrote:
> Glauber de Oliveira Costa wrote:
>
>>> +union kvm_hv_clock {
>>> + struct {
>>> + u64 tsc_mult;
>>> + u64 now_ns;
>>> + /* That's the wall clock, not the water closet */
>>> + u64 wc_sec;
>>> + u64 wc_nsec;
>>>
>>>
>
> Do we really need 128-bit time? you must be planning to live forever.
>

Well, he's planning on having lots of very small nanoseconds.

J

2007-11-07 08:18:32

by Akio Takebe

[permalink] [raw]

Subject: Re: [kvm-devel] include files for kvmclock

Hi, Glauber

This is interesting facility. :-)

>+#define KVM_HCALL_REGISTER_CLOCK 1
>+
>+union kvm_hv_clock {
>+ struct {
>+ u64 tsc_mult;
>+ u64 now_ns;
>+ /* That's the wall clock, not the water closet */
>+ u64 wc_sec;
>+ u64 wc_nsec;
>+ u64 last_tsc;
>+ /* At first, we could use the tsc value as a marker, but Jeremy
>+ * well noted that it will cause us locking problems in 32-bit
>+ * sys, so we have a special version field */
>+ u32 version;
>+ };
>+ char page_align[PAGE_SIZE];
>+};
>+
Why does kvm_hv_clock need page_align?
And also the kvm_hv_clock is alloced with kvm_vcpu,
so the align is not enough, isn't it?
I thik __atribute__((__aligne__(PAGE_SIZE)))) is better than it.

Best Regards,

Akio Takebe

2007-11-07 12:49:18

by Glauber Costa

[permalink] [raw]

Subject: Re: [kvm-devel] include files for kvmclock

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jeremy Fitzhardinge escreveu:
> Avi Kivity wrote:
>> Glauber de Oliveira Costa wrote:
>>
>>>> +union kvm_hv_clock {
>>>> + struct {
>>>> + u64 tsc_mult;
>>>> + u64 now_ns;
>>>> + /* That's the wall clock, not the water closet */
>>>> + u64 wc_sec;
>>>> + u64 wc_nsec;
>>>>
>>>>
>> Do we really need 128-bit time? you must be planning to live forever.
>>
>
> Well, he's planning on having lots of very small nanoseconds.
>
> J
The wc_nsec is legacy, and should be gone. It's not really used in
current code. However, you gave me a very good idea. Living forever
would be awesome! Where can I apply ?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMbRfjYI8LaFUWXMRAjsQAJ4vDBW0M48fMaL9sl6XfN0+Pd82egCgutwe
9rR7+H8MUQznyinlJc76kbo=
=b3wx
-----END PGP SIGNATURE-----

2007-11-07 13:07:30

by Glauber Costa

[permalink] [raw]

Subject: Re: kvmclock - the host part.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Avi Kivity escreveu:
> Glauber de Oliveira Costa wrote:
>> This is the host part of kvm clocksource implementation. As it does
>> not include clockevents, it is a fairly simple implementation. We
>> only have to register a per-vcpu area, and start writting to it periodically.
>>
>> Signed-off-by: Glauber de Oliveira Costa <[email protected]>
>> ---
>> drivers/kvm/irq.c | 1 +
>> drivers/kvm/kvm_main.c | 2 +
>> drivers/kvm/svm.c | 1 +
>> drivers/kvm/vmx.c | 1 +
>> drivers/kvm/x86.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
>> drivers/kvm/x86.h | 13 ++++++++++
>> 6 files changed, 77 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/kvm/irq.c b/drivers/kvm/irq.c
>> index 22bfeee..0344879 100644
>> --- a/drivers/kvm/irq.c
>> +++ b/drivers/kvm/irq.c
>> @@ -92,6 +92,7 @@ void kvm_vcpu_kick_request(struct kvm_vcpu *vcpu, int request)
>>
>> void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>> {
>> + vcpu->time_needs_update = 1;
>>
>
> Why here and not in __vcpu_run()? It isn't timer irq related.
Because my plan was exactly, updating it at each timer interrupt.
There's a trade off between
updating every run (hopefully more precision, but more overhead), versus
updating at timer irqs, or other events.

What would you prefer?

>> @@ -1242,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
>> case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
>> case KVM_CAP_USER_MEMORY:
>> case KVM_CAP_SET_TSS_ADDR:
>> + case KVM_CAP_CLK:
>>
>
> It's just a clock source now, right? so _CLOCK_SOURCE.
Right.

>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu)
>> +{
>> + struct timespec ts;
>> + void *clock_addr;
>> +
>> +
>> + if (!vcpu->clock_page)
>> + return;
>> +
>> + /* Updates version to the next odd number, indicating we're writing */
>> + vcpu->hv_clock.version++;
>>
>
> No one can actually see this as you're updating a private structure.
> You need to copy it to guestspace.
That's true, I'm just copying it at the end, the whole thing. thanks.

>> + /* Updating the tsc count is the first thing we do */
>> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, &vcpu->hv_clock.last_tsc);
>> + ktime_get_ts(&ts);
>> + vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
>> + vcpu->hv_clock.wc_sec = get_seconds();
>> + vcpu->hv_clock.version++;
>> +
>> + clock_addr = vcpu->clock_addr;
>> + memcpy(clock_addr, &vcpu->hv_clock, sizeof(vcpu->hv_clock));
>> + mark_page_dirty(vcpu->kvm, vcpu->clock_gfn);
>>
>
> Just use kvm_write_guest().
Too slow. Updating guest time, even only in timer interrupts, was a too
frequent operation, and the kmap / kunmap (atomic) at every iteration
deemed the whole thing
unusable.

>> +
>> + vcpu->time_needs_update = 0;
>> +}
>> +
>> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> {
>> unsigned long nr, a0, a1, a2, a3, ret;
>> @@ -1648,7 +1674,33 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> a3 &= 0xFFFFFFFF;
>> }
>>
>> + ret = 0;
>> switch (nr) {
>> + case KVM_HCALL_REGISTER_CLOCK: {
>> + struct kvm_vcpu *dst_vcpu;
>> +
>> + if (!((a1 < KVM_MAX_VCPUS) && (vcpu->kvm->vcpus[a1]))) {
>> + ret = -KVM_EINVAL;
>> + break;
>> + }
>> +
>> + dst_vcpu = vcpu->kvm->vcpus[a1];
>>
>
> What if !dst_vcpu? What about locking?
>
> Suggest simply using vcpu. Every guest cpu can register its own
Earlier version had a check for !dst_vcpu, you are absolutely right.

Locking was not a problem in practice, because these operations are done
serialized, by the same cpu.

This hypercall is called by cpu_up, which, at least in the beginning,
it's called by cpu0. And that's why each vcpu cannot register its own.
(And why we don't need locking).

Well, theorectically each vcpu do can register its own clocksource, it
will just be a little bit more complicated, we have to fire out an IPI,
and have the other cpu to catch it, and call the hypercall.

But I honestly don't like it.
Usually, the cpu leaves start_secondary with a clock already registered,
so the kernel relies on it.

>
>> + dst_vcpu->clock_page = gfn_to_page(vcpu->kvm, a0 >> PAGE_SHIFT);
>>
>
> Shift right? Why?
a0 is not a gfn, but a physical address.

>> +
>> + if (!dst_vcpu->clock_page) {
>>
>
> IIRC gfn_to_page() never returns NULL, need a different check.
You are right. I developed part of it in an older version of kvm, where
reality was:

struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
{
struct kvm_memory_slot *slot;

gfn = unalias_gfn(kvm, gfn);
slot = __gfn_to_memslot(kvm, gfn);
if (!slot)
return NULL;
return slot->phys_mem[gfn - slot->base_gfn];
}

Will update.

>> + ret = -KVM_EINVAL;
>> + break;
>> + }
>> + dst_vcpu->clock_gfn = a0 >> PAGE_SHIFT;
>> +
>> + dst_vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22);
>> + dst_vcpu->clock_addr = kmap(dst_vcpu->clock_page);
>>
>
> kmap() is bad since the page can move due to swapping.
> kvm_write_guest() is your friend.
Yeah , right, but again: It will be slow to the point of making the
whole thing not worthy. So what alternatives do we get?

>> +static inline void release_clock(struct kvm_vcpu *vcpu)
>> +{
>> + if (vcpu->clock_page)
>> + kunmap(vcpu->clock_page);
>> +}
>>
>
>
> While it's a static inline, please prefix with kvm_ in case one day it
> isn't.
>

Okay, will do.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMbjDjYI8LaFUWXMRAvcnAKCZOtPqHAxvcUkAfSaOezPWq1ib2wCg1TNz
fT1rt86/j25K/6lmFsRmbI0=
=nSkW
-----END PGP SIGNATURE-----

2007-11-07 13:16:17

by Glauber Costa

[permalink] [raw]

Subject: Re: [kvm-devel] include files for kvmclock

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Akio Takebe escreveu:
> Hi, Glauber
>
> This is interesting facility. :-)
>
>> +#define KVM_HCALL_REGISTER_CLOCK 1
>> +
>> +union kvm_hv_clock {
>> + struct {
>> + u64 tsc_mult;
>> + u64 now_ns;
>> + /* That's the wall clock, not the water closet */
>> + u64 wc_sec;
>> + u64 wc_nsec;
>> + u64 last_tsc;
>> + /* At first, we could use the tsc value as a marker, but Jeremy
>> + * well noted that it will cause us locking problems in 32-bit
>> + * sys, so we have a special version field */
>> + u32 version;
>> + };
>> + char page_align[PAGE_SIZE];
>> +};
>> +
> Why does kvm_hv_clock need page_align?
Each vcpu will register a page on its own. In the guest side, it will be
an array of pages. So, we make it page sized.

> And also the kvm_hv_clock is alloced with kvm_vcpu,
There's no requirements on the host part at all. So it doesn't really
matter. In the next version, I may make it even a simple pointer.

> so the align is not enough, isn't it?
> I thik __atribute__((__aligne__(PAGE_SIZE)))) is better than it.
It deals with the start of the structure, but not with its size. See the
guest part: Where it matters, I do use it.

Thanks

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMbqyjYI8LaFUWXMRAgfOAKCTeKF3cWbhILYSXY+MjtXo8B87EwCeNNhn
z9RDYaCWHIxsqlciMF0i27w=
=EIEM
-----END PGP SIGNATURE-----

2007-11-07 13:29:46

by Akio Takebe

[permalink] [raw]

Subject: Re: [kvm-devel] include files for kvmclock

Hi,

>> Why does kvm_hv_clock need page_align?
>Each vcpu will register a page on its own. In the guest side, it will be
>an array of pages. So, we make it page sized.
>
>> And also the kvm_hv_clock is alloced with kvm_vcpu,
>There's no requirements on the host part at all. So it doesn't really
>matter. In the next version, I may make it even a simple pointer.
>
>> so the align is not enough, isn't it?
>> I thik __atribute__((__aligne__(PAGE_SIZE)))) is better than it.
>It deals with the start of the structure, but not with its size. See the
>guest part: Where it matters, I do use it.
Oh, I confused the guest/host part, but I could understand it.
Thank you for your explanation.

Best Regards,

Akio Takebe

2007-11-07 14:00:52

by Avi Kivity

[permalink] [raw]

Subject: Re: kvmclock - the host part.

Glauber de Oliveira Costa wrote:
>>> void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>>> {
>>> + vcpu->time_needs_update = 1;
>>>
>>>
>> Why here and not in __vcpu_run()? It isn't timer irq related.
>>
> Because my plan was exactly, updating it at each timer interrupt.
>

I think kvm_inject_pending_timer_irqs() is called every __vcpu_run(), so
your cunning plan has been foiled.

Did you mean each guest interrupt of host interrupt?

> There's a trade off between
> updating every run (hopefully more precision, but more overhead), versus
> updating at timer irqs, or other events.
>
> What would you prefer?
>

I think that we should update it every time a heavyweight exit has been
taken. That takes care of the tradeoff quite nicely -- heavyweight
exits are already dog slow.

>
>
>>> + /* Updating the tsc count is the first thing we do */
>>> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, &vcpu->hv_clock.last_tsc);
>>> + ktime_get_ts(&ts);
>>> + vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
>>> + vcpu->hv_clock.wc_sec = get_seconds();
>>> + vcpu->hv_clock.version++;
>>> +
>>> + clock_addr = vcpu->clock_addr;
>>> + memcpy(clock_addr, &vcpu->hv_clock, sizeof(vcpu->hv_clock));
>>> + mark_page_dirty(vcpu->kvm, vcpu->clock_gfn);
>>>
>>>
>> Just use kvm_write_guest().
>>
> Too slow. Updating guest time, even only in timer interrupts, was a too
> frequent operation, and the kmap / kunmap (atomic) at every iteration
> deemed the whole thing
> unusable.
>

kvm_write_guest() will eventually be a copy_to_user(), so you need not
fear the overhead.

>>>
>>> + ret = 0;
>>> switch (nr) {
>>> + case KVM_HCALL_REGISTER_CLOCK: {
>>> + struct kvm_vcpu *dst_vcpu;
>>> +
>>> + if (!((a1 < KVM_MAX_VCPUS) && (vcpu->kvm->vcpus[a1]))) {
>>> + ret = -KVM_EINVAL;
>>> + break;
>>> + }
>>> +
>>> + dst_vcpu = vcpu->kvm->vcpus[a1];
>>>
>>>
>> What if !dst_vcpu? What about locking?
>>
>> Suggest simply using vcpu. Every guest cpu can register its own
>>
> Earlier version had a check for !dst_vcpu, you are absolutely right.
>
> Locking was not a problem in practice, because these operations are done
> serialized, by the same cpu.
>

Think evil guest that cares not for the well-being of the host.

> This hypercall is called by cpu_up, which, at least in the beginning,
> it's called by cpu0. And that's why each vcpu cannot register its own.
> (And why we don't need locking).
>
> Well, theorectically each vcpu do can register its own clocksource, it
> will just be a little bit more complicated, we have to fire out an IPI,
> and have the other cpu to catch it, and call the hypercall.
>
>

Can it not be done via the processor startup sequence? Then there's no
need for ipis and locking.

I imagine a normal guest initializes the apic in the same way.

> But I honestly don't like it.
> Usually, the cpu leaves start_secondary with a clock already registered,
> so the kernel relies on it.
>
>
>>> + dst_vcpu->clock_page = gfn_to_page(vcpu->kvm, a0 >> PAGE_SHIFT);
>>>
>>>
>> Shift right? Why?
>>
> a0 is not a gfn, but a physical address.
>

What if the guest wants to place it in address 5GB? That's unlikely for
Linux and Windows, but let's do it right anyway.

--
error compiling committee.c: too many arguments to function