2016-11-09 16:48:22

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use

Userspace can read the exact value of kvmclock by reading the TSC
and fetching the timekeeping parameters out of guest memory. This
however is brittle and not necessary anymore with KVM 4.11. Provide
a mechanism that lets userspace know if the new KVM_GET_CLOCK
semantics are in effect, and---since we are at it---if the clock
is stable across all VCPUs.

Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virtual/kvm/api.txt | 11 +++++++++++
arch/x86/kvm/x86.c | 10 +++++++---
include/uapi/linux/kvm.h | 7 +++++++
3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 739db9ab16b2..6bbceb9a3a19 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -777,6 +777,17 @@ Gets the current timestamp of kvmclock as seen by the current guest. In
conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
such as migration.

+When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
+set of bits that KVM can return in struct kvm_clock_data's flag member.
+
+The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
+value is the exact kvmclock value seen by all VCPUs at the instant
+when KVM_GET_CLOCK was called. If clear, the returned value is simply
+CLOCK_MONOTONIC plus a constant offset; the offset can be modified
+with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
+but the exact value read by each VCPU could differ, because the host
+TSC is not stable.
+
struct kvm_clock_data {
__u64 clock; /* kvmclock current value */
__u32 flags;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3017de0431bd..1ba08278a9a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2596,7 +2596,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PIT_STATE2:
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
case KVM_CAP_XEN_HVM:
- case KVM_CAP_ADJUST_CLOCK:
case KVM_CAP_VCPU_EVENTS:
case KVM_CAP_HYPERV:
case KVM_CAP_HYPERV_VAPIC:
@@ -2623,6 +2622,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
#endif
r = 1;
break;
+ case KVM_CAP_ADJUST_CLOCK:
+ r = KVM_CLOCK_TSC_STABLE;
+ break;
case KVM_CAP_X86_SMM:
/* SMBASE is usually relocated above 1M on modern chipsets,
* and SMM handlers might indeed rely on 4G segment limits,
@@ -4103,9 +4105,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
struct kvm_clock_data user_ns;
u64 now_ns;

- now_ns = get_kvmclock_ns(kvm);
+ local_irq_disable();
+ now_ns = __get_kvmclock_ns(kvm);
user_ns.clock = now_ns;
- user_ns.flags = 0;
+ user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
+ local_irq_enable();
memset(&user_ns.pad, 0, sizeof(user_ns.pad));

r = -EFAULT;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 300ef255d1e0..4ee67cb99143 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -972,12 +972,19 @@ struct kvm_irqfd {
__u8 pad[16];
};

+/* For KVM_CAP_ADJUST_CLOCK */
+
+/* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */
+#define KVM_CLOCK_TSC_STABLE 2
+
struct kvm_clock_data {
__u64 clock;
__u32 flags;
__u32 pad[9];
};

+/* For KVM_CAP_SW_TLB */
+
#define KVM_MMU_FSL_BOOKE_NOHV 0
#define KVM_MMU_FSL_BOOKE_HV 1

--
1.8.3.1


2016-11-09 20:13:21

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use

On Wed, Nov 09, 2016 at 05:48:15PM +0100, Paolo Bonzini wrote:
> Userspace can read the exact value of kvmclock by reading the TSC
> and fetching the timekeeping parameters out of guest memory. This
> however is brittle and not necessary anymore with KVM 4.11.

Hi Paolo,

Can you point to commit or explanation why that is not the case anymore?
Thanks

> Provide
> a mechanism that lets userspace know if the new KVM_GET_CLOCK
> semantics are in effect, and---since we are at it---if the clock
> is stable across all VCPUs.
>
> Cc: Radim Krčmář <[email protected]>
> Cc: Marcelo Tosatti <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> Documentation/virtual/kvm/api.txt | 11 +++++++++++
> arch/x86/kvm/x86.c | 10 +++++++---
> include/uapi/linux/kvm.h | 7 +++++++
> 3 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 739db9ab16b2..6bbceb9a3a19 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -777,6 +777,17 @@ Gets the current timestamp of kvmclock as seen by the current guest. In
> conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
> such as migration.
>
> +When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
> +set of bits that KVM can return in struct kvm_clock_data's flag member.
> +
> +The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
> +value is the exact kvmclock value seen by all VCPUs at the instant
> +when KVM_GET_CLOCK was called. If clear, the returned value is simply
> +CLOCK_MONOTONIC plus a constant offset; the offset can be modified
> +with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
> +but the exact value read by each VCPU could differ, because the host
> +TSC is not stable.
> +
> struct kvm_clock_data {
> __u64 clock; /* kvmclock current value */
> __u32 flags;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3017de0431bd..1ba08278a9a9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2596,7 +2596,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_PIT_STATE2:
> case KVM_CAP_SET_IDENTITY_MAP_ADDR:
> case KVM_CAP_XEN_HVM:
> - case KVM_CAP_ADJUST_CLOCK:
> case KVM_CAP_VCPU_EVENTS:
> case KVM_CAP_HYPERV:
> case KVM_CAP_HYPERV_VAPIC:
> @@ -2623,6 +2622,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> #endif
> r = 1;
> break;
> + case KVM_CAP_ADJUST_CLOCK:
> + r = KVM_CLOCK_TSC_STABLE;
> + break;
> case KVM_CAP_X86_SMM:
> /* SMBASE is usually relocated above 1M on modern chipsets,
> * and SMM handlers might indeed rely on 4G segment limits,
> @@ -4103,9 +4105,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
> struct kvm_clock_data user_ns;
> u64 now_ns;
>
> - now_ns = get_kvmclock_ns(kvm);
> + local_irq_disable();
> + now_ns = __get_kvmclock_ns(kvm);
> user_ns.clock = now_ns;
> - user_ns.flags = 0;
> + user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> + local_irq_enable();
> memset(&user_ns.pad, 0, sizeof(user_ns.pad));
>
> r = -EFAULT;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 300ef255d1e0..4ee67cb99143 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -972,12 +972,19 @@ struct kvm_irqfd {
> __u8 pad[16];
> };
>
> +/* For KVM_CAP_ADJUST_CLOCK */
> +
> +/* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */
> +#define KVM_CLOCK_TSC_STABLE 2
> +
> struct kvm_clock_data {
> __u64 clock;
> __u32 flags;
> __u32 pad[9];
> };
>
> +/* For KVM_CAP_SW_TLB */
> +
> #define KVM_MMU_FSL_BOOKE_NOHV 0
> #define KVM_MMU_FSL_BOOKE_HV 1
>
> --
> 1.8.3.1

2016-11-09 20:17:39

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use

On Wed, Nov 09, 2016 at 06:12:50PM -0200, Marcelo Tosatti wrote:
> On Wed, Nov 09, 2016 at 05:48:15PM +0100, Paolo Bonzini wrote:
> > Userspace can read the exact value of kvmclock by reading the TSC
> > and fetching the timekeeping parameters out of guest memory. This
> > however is brittle and not necessary anymore with KVM 4.11.
>
> Hi Paolo,
>
> Can you point to commit or explanation why that is not the case anymore?
> Thanks

I don't see how thats possible given the:

* TSC
* TSC (timer interrupt + TSC deltas) (AKA host TSC clocksource and
* CLOCK_MONOTONIC).

Clocks currently drift from each other (therefore are not monotonic).

Have you confirmed CLOCK_MONOTONIC_RAW and TSC are monotonic?


2016-11-09 22:03:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use



----- Original Message -----
> From: "Marcelo Tosatti" <[email protected]>
> To: "Paolo Bonzini" <[email protected]>
> Cc: [email protected], [email protected], "Radim Krčmář" <[email protected]>
> Sent: Wednesday, November 9, 2016 9:17:19 PM
> Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
>
> On Wed, Nov 09, 2016 at 06:12:50PM -0200, Marcelo Tosatti wrote:
> > On Wed, Nov 09, 2016 at 05:48:15PM +0100, Paolo Bonzini wrote:
> > > Userspace can read the exact value of kvmclock by reading the TSC
> > > and fetching the timekeeping parameters out of guest memory. This
> > > however is brittle and not necessary anymore with KVM 4.11.
> >
> > Hi Paolo,
> >
> > Can you point to commit or explanation why that is not the case anymore?
> > Thanks
>
> I don't see how thats possible given the:
>
> * TSC
> * TSC (timer interrupt + TSC deltas) (AKA host TSC clocksource and
> * CLOCK_MONOTONIC).
>
> Clocks currently drift from each other (therefore are not monotonic).

__get_kernel_ns now computes the guest's kvmclock directly when using the
master clock. Instead of peeking at guest memory, it does it from the
hypervisor's copy of the parameters.

Paolo

2016-11-10 00:03:04

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use

On Wed, Nov 09, 2016 at 06:17:16PM -0200, Marcelo Tosatti wrote:
> On Wed, Nov 09, 2016 at 06:12:50PM -0200, Marcelo Tosatti wrote:
> > On Wed, Nov 09, 2016 at 05:48:15PM +0100, Paolo Bonzini wrote:
> > > Userspace can read the exact value of kvmclock by reading the TSC
> > > and fetching the timekeeping parameters out of guest memory. This
> > > however is brittle and not necessary anymore with KVM 4.11.
> >
> > Hi Paolo,
> >
> > Can you point to commit or explanation why that is not the case anymore?
> > Thanks
>
> I don't see how thats possible given the:
>
> * TSC
> * TSC (timer interrupt + TSC deltas) (AKA host TSC clocksource and
> * CLOCK_MONOTONIC).
>
> Clocks currently drift from each other (therefore are not monotonic).
>
> Have you confirmed CLOCK_MONOTONIC_RAW and TSC are monotonic?

I think the benefits of CLOCK_MONOTONIC_RAW (which supposedly
is monotonic with TSC for large lenghts of time) outweight the
downsides (just have to check that).

The downsides are that on migration from

source: TSC with freq Y.
dest: TSC with freq X.

Suddenly kvmclock starts drifting at a different rate,
unsure how well Windows time service and
older NTPd respond to this (might be a regression).


2016-11-10 00:03:03

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use

On Wed, Nov 09, 2016 at 05:48:15PM +0100, Paolo Bonzini wrote:
> Userspace can read the exact value of kvmclock by reading the TSC
> and fetching the timekeeping parameters out of guest memory. This
> however is brittle and not necessary anymore with KVM 4.11. Provide
> a mechanism that lets userspace know if the new KVM_GET_CLOCK
> semantics are in effect, and---since we are at it---if the clock
> is stable across all VCPUs.
>
> Cc: Radim Krčmář <[email protected]>
> Cc: Marcelo Tosatti <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> Documentation/virtual/kvm/api.txt | 11 +++++++++++
> arch/x86/kvm/x86.c | 10 +++++++---
> include/uapi/linux/kvm.h | 7 +++++++
> 3 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 739db9ab16b2..6bbceb9a3a19 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -777,6 +777,17 @@ Gets the current timestamp of kvmclock as seen by the current guest. In
> conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
> such as migration.
>
> +When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
> +set of bits that KVM can return in struct kvm_clock_data's flag member.
> +
> +The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
> +value is the exact kvmclock value seen by all VCPUs at the instant
> +when KVM_GET_CLOCK was called. If clear, the returned value is simply
> +CLOCK_MONOTONIC plus a constant offset; the offset can be modified
> +with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
> +but the exact value read by each VCPU could differ, because the host
> +TSC is not stable.
> +
> struct kvm_clock_data {
> __u64 clock; /* kvmclock current value */
> __u32 flags;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3017de0431bd..1ba08278a9a9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2596,7 +2596,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_PIT_STATE2:
> case KVM_CAP_SET_IDENTITY_MAP_ADDR:
> case KVM_CAP_XEN_HVM:
> - case KVM_CAP_ADJUST_CLOCK:
> case KVM_CAP_VCPU_EVENTS:
> case KVM_CAP_HYPERV:
> case KVM_CAP_HYPERV_VAPIC:
> @@ -2623,6 +2622,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> #endif
> r = 1;
> break;
> + case KVM_CAP_ADJUST_CLOCK:
> + r = KVM_CLOCK_TSC_STABLE;
> + break;
> case KVM_CAP_X86_SMM:
> /* SMBASE is usually relocated above 1M on modern chipsets,
> * and SMM handlers might indeed rely on 4G segment limits,
> @@ -4103,9 +4105,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
> struct kvm_clock_data user_ns;
> u64 now_ns;
>
> - now_ns = get_kvmclock_ns(kvm);
> + local_irq_disable();
> + now_ns = __get_kvmclock_ns(kvm);
> user_ns.clock = now_ns;
> - user_ns.flags = 0;
> + user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> + local_irq_enable();
> memset(&user_ns.pad, 0, sizeof(user_ns.pad));
>
> r = -EFAULT;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 300ef255d1e0..4ee67cb99143 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -972,12 +972,19 @@ struct kvm_irqfd {
> __u8 pad[16];
> };
>
> +/* For KVM_CAP_ADJUST_CLOCK */
> +
> +/* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */
> +#define KVM_CLOCK_TSC_STABLE 2
> +
> struct kvm_clock_data {
> __u64 clock;
> __u32 flags;
> __u32 pad[9];
> };
>
> +/* For KVM_CAP_SW_TLB */
> +
> #define KVM_MMU_FSL_BOOKE_NOHV 0
> #define KVM_MMU_FSL_BOOKE_HV 1
>
> --
> 1.8.3.1

Nevermind the previous comments - imagined you were using CLOCK_MONOTONIC_RAW.

Looks good to me modulo the race mentioned on the other email: lets see if
KVM_GET_CLOCK at pre_save gets rid of that 100ms.

Why is advance_ns a hack? Would like to send over the delta between
kvmclock pre_save and EOF packet so destination can advance that
as well (which requires separate migration entry).


2016-11-19 19:34:40

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use

2016-11-09 17:48+0100, Paolo Bonzini:
> Userspace can read the exact value of kvmclock by reading the TSC
> and fetching the timekeeping parameters out of guest memory. This
> however is brittle and not necessary anymore with KVM 4.11. Provide
> a mechanism that lets userspace know if the new KVM_GET_CLOCK
> semantics are in effect, and---since we are at it---if the clock
> is stable across all VCPUs.
>
> Cc: Radim Krčmář <[email protected]>
> Cc: Marcelo Tosatti <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---

Applied to kvm/master, thanks.