Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934630AbcKNRNq (ORCPT ); Mon, 14 Nov 2016 12:13:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56340 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934219AbcKNRNo (ORCPT ); Mon, 14 Nov 2016 12:13:44 -0500 Date: Mon, 14 Nov 2016 15:05:52 -0200 From: Marcelo Tosatti To: Radim =?utf-8?B?S3LEjW3DocWZ?= Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH] KVM: x86: do not go through vcpu in __get_kvmclock_ns Message-ID: <20161114170550.GA6838@amt.cnet> References: <1478859141-25146-1-git-send-email-pbonzini@redhat.com> <20161114145239.GA2185@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20161114145239.GA2185@potion> User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 14 Nov 2016 17:13:44 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4210 Lines: 116 On Mon, Nov 14, 2016 at 03:52:40PM +0100, Radim Krčmář wrote: > 2016-11-11 11:12+0100, Paolo Bonzini: > > Going through the first VCPU is wrong if you follow a KVM_SET_CLOCK with > > a KVM_GET_CLOCK immediately after, without letting the VCPU run and > > call kvm_guest_time_update. > > > > This is easily fixed however, because kvm_get_time_and_clockread provides > > the information we want. > > > > Reported-by: Marcelo Tosatti > > Signed-off-by: Paolo Bonzini > > --- > > arch/x86/kvm/x86.c | 18 ++++++++++-------- > > 1 file changed, 10 insertions(+), 8 deletions(-) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 1ba08278a9a9..1c16c6d7df7a 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -1620,6 +1620,11 @@ static bool kvm_get_time_and_clockread(s64 *kernel_ns, cycle_t *cycle_now) > > > > return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC; > > } > > +#else > > +static inline bool kvm_get_time_and_clockread(s64 *kernel_ns, cycle_t *cycle_now) > > +{ > > + return false; > > +} > > #endif > > > > /* > > @@ -1724,18 +1729,15 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) > > > > static u64 __get_kvmclock_ns(struct kvm *kvm) > > { > > - struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0); > > struct kvm_arch *ka = &kvm->arch; > > + cycle_t cycle_now; > > s64 ns; > > > > - if (vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT) { > > - u64 tsc = kvm_read_l1_tsc(vcpu, rdtsc()); > > - ns = __pvclock_read_cycles(&vcpu->arch.hv_clock, tsc); > > This patch regresses the behavior as well, because the assumption that > kvm_get_time_and_clockread() and __pvclock_read_cycles() count the same > time doesn't hold. See the end of the message for a quick test. > > kvm_get_time_and_clockread() is actually the same as ktime_get_boot_ns() > (if it works), so we'd be just obfucating the code. :) > > I think that making kvmclock count as ktime_get_boot_ns() would be the > best solution, but not possible this late in 4.9 ... > > As a quick hack, I think it would be better to duplicate the update that > would happen when running the VCPU before calling > __pvclock_read_cycles(), i.e. paste something like this: > > if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) > kvm_guest_time_update(vcpu); > > > - } else { > > - ns = ktime_get_boot_ns() + ka->kvmclock_offset; > > - } > > + if (!ka->use_master_clock || > > + !kvm_get_time_and_clockread(&ns, &cycle_now)) > > + ns = ktime_get_boot_ns(); > > > > - return ns; > > + return ns + ka->kvmclock_offset; > > } > > The hunk below should return the same value in pvclock_ns and kernel_ns > if they can be used interchangeably. boot_ns is expected to be a bit > delayed, because it is read late. boot_ns shows a bounded offset from > kernel_ns, unlike the drifting pvclock_ns. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 83990ad3710e..30d4d3d02ac7 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -6653,6 +6653,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > goto cancel_injection; > } > > + if (vcpu->kvm->arch.use_master_clock) { > + s64 kernel_ns; > + cycle_t tsc_now, pvclock_ns, boot_ns; > + > + kvm_get_time_and_clockread(&kernel_ns, &tsc_now); > + pvclock_ns = __pvclock_read_cycles(&vcpu->arch.hv_clock, kvm_read_l1_tsc(vcpu, tsc_now)) - vcpu->kvm->arch.kvmclock_offset; > + boot_ns = ktime_get_boot_ns(); > + > + printk("ns diff: %lld %lld\n", pvclock_ns - kernel_ns, boot_ns - kernel_ns); > + } > + > preempt_disable(); > > kvm_x86_ops->prepare_guest_switch(vcpu); > > and a sample output: KVM_GET_CLOCK should return what the guest sees at the moment KVM_GET_CLOCK is called, which should include if (vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT) { u64 tsc = kvm_read_l1_tsc(vcpu, rdtsc()); ns = __pvclock_read_cycles(&vcpu->arch.hv_clock, tsc); } else { ns = ktime_get_boot_ns() + ka->kvmclock_offset; >>> add (rdtsc() - tsc_timestamp), if kvmclock is enabled } The addition under >>> above.