2009-10-06 17:25:26

by Glauber Costa

[permalink] [raw]
Subject: [PATCH] allow userspace to adjust kvmclock offset

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This new proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and more
importantly, never goes backwards.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
include/linux/kvm.h | 1 +
3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 179a919..c9b0d9f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -410,6 +410,7 @@ struct kvm_arch{

unsigned long irq_sources_bitmap;
u64 vm_init_tsc;
+ s64 kvmclock_offset;
};

struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 747e566..9c9939e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -699,7 +699,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
/* With all the info we got, fill in the values */

vcpu->hv_clock.system_time = ts.tv_nsec +
- (NSEC_PER_SEC * (u64)ts.tv_sec);
+ (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
+
/*
* The interface expects us to write an even number signaling that the
* update is finished. Since the guest won't see the intermediate
@@ -2437,6 +2438,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_ADJUST_CLOCK: {
+ struct timespec now;
+ u64 now_ns, user_ns;
+ long delta;
+
+ r = -EFAULT;
+ if (copy_from_user(&user_ns, argp, sizeof(user_ns)))
+ goto out;
+
+ r = 0;
+ ktime_get_ts(&now);
+ now_ns = timespec_to_ns(&now) + kvm->arch.kvmclock_offset;
+ delta = user_ns - now_ns;
+ kvm->arch.kvmclock_offset = delta;
+ break;
+ }
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..0cd5ad8 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -546,6 +546,7 @@ struct kvm_irqfd {
#define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_ADJUST_CLOCK _IOW(KVMIO, 0x7a, __u64)

/*
* ioctls for vcpu fds
--
1.6.2.2


2009-10-12 08:53:56

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] allow userspace to adjust kvmclock offset

On 10/06/2009 07:24 PM, Glauber Costa wrote:
> When we migrate a kvm guest that uses pvclock between two hosts, we may
> suffer a large skew. This is because there can be significant differences
> between the monotonic clock of the hosts involved. When a new host with
> a much larger monotonic time starts running the guest, the view of time
> will be significantly impacted.
>
> Situation is much worse when we do the opposite, and migrate to a host with
> a smaller monotonic clock.
>
> This new proposed ioctl will allow userspace to inform us what is the monotonic
> clock value in the source host, so we can keep the time skew short, and more
> importantly, never goes backwards.
>
>

> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index f8f8900..0cd5ad8 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -546,6 +546,7 @@ struct kvm_irqfd {
> #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
> #define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
> #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
> +#define KVM_ADJUST_CLOCK _IOW(KVMIO, 0x7a, __u64)
>

Please change to a struct with some reserved space.

Do we want an absolute or relative adjustment?

--
error compiling committee.c: too many arguments to function

2009-10-13 12:28:58

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] allow userspace to adjust kvmclock offset

On Mon, Oct 12, 2009 at 10:53:26AM +0200, Avi Kivity wrote:
> On 10/06/2009 07:24 PM, Glauber Costa wrote:
>> When we migrate a kvm guest that uses pvclock between two hosts, we may
>> suffer a large skew. This is because there can be significant differences
>> between the monotonic clock of the hosts involved. When a new host with
>> a much larger monotonic time starts running the guest, the view of time
>> will be significantly impacted.
>>
>> Situation is much worse when we do the opposite, and migrate to a host with
>> a smaller monotonic clock.
>>
>> This new proposed ioctl will allow userspace to inform us what is the monotonic
>> clock value in the source host, so we can keep the time skew short, and more
>> importantly, never goes backwards.
>>
>>
>
>> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
>> index f8f8900..0cd5ad8 100644
>> --- a/include/linux/kvm.h
>> +++ b/include/linux/kvm.h
>> @@ -546,6 +546,7 @@ struct kvm_irqfd {
>> #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
>> #define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
>> #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
>> +#define KVM_ADJUST_CLOCK _IOW(KVMIO, 0x7a, __u64)
>>
>
> Please change to a struct with some reserved space.
Ok, can do it.

>
> Do we want an absolute or relative adjustment?
What exactly do you mean?

2009-10-13 12:31:40

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] allow userspace to adjust kvmclock offset

On 10/13/2009 03:28 PM, Glauber Costa wrote:
>
>> Do we want an absolute or relative adjustment?
>>
> What exactly do you mean?
>

Absolute adjustment: clock = t
Relative adjustment: clock += t


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-10-13 12:47:08

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] allow userspace to adjust kvmclock offset

On Tue, Oct 13, 2009 at 03:31:08PM +0300, Avi Kivity wrote:
> On 10/13/2009 03:28 PM, Glauber Costa wrote:
>>
>>> Do we want an absolute or relative adjustment?
>>>
>> What exactly do you mean?
>>
>
> Absolute adjustment: clock = t
> Relative adjustment: clock += t
The delta is absolute, but the adjustment in the clock is relative.

So we pick the difference between what userspace is passing us and what
we currently have, then relatively adds up so we can make sure we won't
go back or suffer a too big skew.

2009-10-15 00:47:22

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] allow userspace to adjust kvmclock offset

On 10/13/2009 09:46 PM, Glauber Costa wrote:
> On Tue, Oct 13, 2009 at 03:31:08PM +0300, Avi Kivity wrote:
>
>> On 10/13/2009 03:28 PM, Glauber Costa wrote:
>>
>>>
>>>> Do we want an absolute or relative adjustment?
>>>>
>>>>
>>> What exactly do you mean?
>>>
>>>
>> Absolute adjustment: clock = t
>> Relative adjustment: clock += t
>>
> The delta is absolute, but the adjustment in the clock is relative.
>
> So we pick the difference between what userspace is passing us and what
> we currently have, then relatively adds up so we can make sure we won't
> go back or suffer a too big skew.
>

The motivation for relative adjustment is when you have a jitter
resistant place to gather timing information (like the kernel, which can
disable interrupts and preemption), then pass it on to kvm without
losing information due to scheduling. For migration there is no such
place since it involves two hosts, but it makes sense to support
relative adjustments.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-10-15 14:59:26

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH] allow userspace to adjust kvmclock offset

On Thu, Oct 15, 2009 at 09:46:52AM +0900, Avi Kivity wrote:
> On 10/13/2009 09:46 PM, Glauber Costa wrote:
>> On Tue, Oct 13, 2009 at 03:31:08PM +0300, Avi Kivity wrote:
>>
>>> On 10/13/2009 03:28 PM, Glauber Costa wrote:
>>>
>>>>
>>>>> Do we want an absolute or relative adjustment?
>>>>>
>>>>>
>>>> What exactly do you mean?
>>>>
>>>>
>>> Absolute adjustment: clock = t
>>> Relative adjustment: clock += t
>>>
>> The delta is absolute, but the adjustment in the clock is relative.
>>
>> So we pick the difference between what userspace is passing us and what
>> we currently have, then relatively adds up so we can make sure we won't
>> go back or suffer a too big skew.
>>
>
> The motivation for relative adjustment is when you have a jitter
> resistant place to gather timing information (like the kernel, which can
> disable interrupts and preemption), then pass it on to kvm without
> losing information due to scheduling. For migration there is no such
> place since it involves two hosts, but it makes sense to support
> relative adjustments.
Since we added the padding you asked for, we could use that bit of information
to define whether it will be a relative or absolute adjustment, then. Right now,
I don't see the point of implementing a code path that will be completely untested.

I'd leave it this way until someone comes up with a need.

2009-10-22 16:23:35

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] allow userspace to adjust kvmclock offset

On 10/15/2009 04:58 PM, Glauber Costa wrote:
>
>> The motivation for relative adjustment is when you have a jitter
>> resistant place to gather timing information (like the kernel, which can
>> disable interrupts and preemption), then pass it on to kvm without
>> losing information due to scheduling. For migration there is no such
>> place since it involves two hosts, but it makes sense to support
>> relative adjustments.
>>
> Since we added the padding you asked for, we could use that bit of information
> to define whether it will be a relative or absolute adjustment, then. Right now,
> I don't see the point of implementing a code path that will be completely untested.
>
> I'd leave it this way until someone comes up with a need.
>

I agree with that, but padding by itself is insufficient. You also need
a flags field.

--
error compiling committee.c: too many arguments to function