Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754131AbcKJADD (ORCPT ); Wed, 9 Nov 2016 19:03:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38710 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751427AbcKJADB (ORCPT ); Wed, 9 Nov 2016 19:03:01 -0500 Date: Wed, 9 Nov 2016 19:43:32 -0200 From: Marcelo Tosatti To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH] kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use Message-ID: <20161109214329.GA5611@amt.cnet> References: <1478710095-15022-1-git-send-email-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1478710095-15022-1-git-send-email-pbonzini@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 10 Nov 2016 00:03:01 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4173 Lines: 111 On Wed, Nov 09, 2016 at 05:48:15PM +0100, Paolo Bonzini wrote: > Userspace can read the exact value of kvmclock by reading the TSC > and fetching the timekeeping parameters out of guest memory. This > however is brittle and not necessary anymore with KVM 4.11. Provide > a mechanism that lets userspace know if the new KVM_GET_CLOCK > semantics are in effect, and---since we are at it---if the clock > is stable across all VCPUs. > > Cc: Radim Krčmář > Cc: Marcelo Tosatti > Signed-off-by: Paolo Bonzini > --- > Documentation/virtual/kvm/api.txt | 11 +++++++++++ > arch/x86/kvm/x86.c | 10 +++++++--- > include/uapi/linux/kvm.h | 7 +++++++ > 3 files changed, 25 insertions(+), 3 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 739db9ab16b2..6bbceb9a3a19 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -777,6 +777,17 @@ Gets the current timestamp of kvmclock as seen by the current guest. In > conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios > such as migration. > > +When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the > +set of bits that KVM can return in struct kvm_clock_data's flag member. > + > +The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned > +value is the exact kvmclock value seen by all VCPUs at the instant > +when KVM_GET_CLOCK was called. If clear, the returned value is simply > +CLOCK_MONOTONIC plus a constant offset; the offset can be modified > +with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock, > +but the exact value read by each VCPU could differ, because the host > +TSC is not stable. > + > struct kvm_clock_data { > __u64 clock; /* kvmclock current value */ > __u32 flags; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 3017de0431bd..1ba08278a9a9 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2596,7 +2596,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_PIT_STATE2: > case KVM_CAP_SET_IDENTITY_MAP_ADDR: > case KVM_CAP_XEN_HVM: > - case KVM_CAP_ADJUST_CLOCK: > case KVM_CAP_VCPU_EVENTS: > case KVM_CAP_HYPERV: > case KVM_CAP_HYPERV_VAPIC: > @@ -2623,6 +2622,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > #endif > r = 1; > break; > + case KVM_CAP_ADJUST_CLOCK: > + r = KVM_CLOCK_TSC_STABLE; > + break; > case KVM_CAP_X86_SMM: > /* SMBASE is usually relocated above 1M on modern chipsets, > * and SMM handlers might indeed rely on 4G segment limits, > @@ -4103,9 +4105,11 @@ long kvm_arch_vm_ioctl(struct file *filp, > struct kvm_clock_data user_ns; > u64 now_ns; > > - now_ns = get_kvmclock_ns(kvm); > + local_irq_disable(); > + now_ns = __get_kvmclock_ns(kvm); > user_ns.clock = now_ns; > - user_ns.flags = 0; > + user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; > + local_irq_enable(); > memset(&user_ns.pad, 0, sizeof(user_ns.pad)); > > r = -EFAULT; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 300ef255d1e0..4ee67cb99143 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -972,12 +972,19 @@ struct kvm_irqfd { > __u8 pad[16]; > }; > > +/* For KVM_CAP_ADJUST_CLOCK */ > + > +/* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */ > +#define KVM_CLOCK_TSC_STABLE 2 > + > struct kvm_clock_data { > __u64 clock; > __u32 flags; > __u32 pad[9]; > }; > > +/* For KVM_CAP_SW_TLB */ > + > #define KVM_MMU_FSL_BOOKE_NOHV 0 > #define KVM_MMU_FSL_BOOKE_HV 1 > > -- > 1.8.3.1 Nevermind the previous comments - imagined you were using CLOCK_MONOTONIC_RAW. Looks good to me modulo the race mentioned on the other email: lets see if KVM_GET_CLOCK at pre_save gets rid of that 100ms. Why is advance_ns a hack? Would like to send over the delta between kvmclock pre_save and EOF packet so destination can advance that as well (which requires separate migration entry).