Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753804AbcKPR1w (ORCPT ); Wed, 16 Nov 2016 12:27:52 -0500 Received: from mx3-phx2.redhat.com ([209.132.183.24]:33790 "EHLO mx3-phx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751741AbcKPR1v (ORCPT ); Wed, 16 Nov 2016 12:27:51 -0500 Date: Wed, 16 Nov 2016 12:27:48 -0500 (EST) From: Paolo Bonzini To: Radim =?utf-8?B?S3LEjW3DocWZ?= Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, mtosatti@redhat.com Message-ID: <1216909500.13168461.1479317268749.JavaMail.zimbra@redhat.com> In-Reply-To: <20161116161030.GA7678@potion> References: <1479145881-20828-1-git-send-email-pbonzini@redhat.com> <20161116161030.GA7678@potion> Subject: Re: [PATCH v2] KVM: x86: do not go through vcpu in __get_kvmclock_ns MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.4.164.1, 10.5.101.130] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF49 (Linux)/8.0.6_GA_5922) Thread-Topic: x86: do not go through vcpu in __get_kvmclock_ns Thread-Index: 7Q8LaNKCeE2ALBVWl0jG5mEriYUHRg== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2795 Lines: 70 > > - if (vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT) { > > - u64 tsc = kvm_read_l1_tsc(vcpu, rdtsc()); > > - ns = __pvclock_read_cycles(&vcpu->arch.hv_clock, tsc); > > - } else { > > - ns = ktime_get_boot_ns() + ka->kvmclock_offset; > > - } > > If we access the "global" master clock, it would be better to prevent it > from changing under our hands with > spin_lock(&ka->pvclock_gtod_sync_lock). Yes, good idea. > > + if (!ka->use_master_clock) > > + return ktime_get_boot_ns() + ka->kvmclock_offset; > > > > - return ns; > > + hv_clock.tsc_timestamp = ka->master_cycle_now; > > + hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; > > + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, > > + &hv_clock.tsc_shift, > > + &hv_clock.tsc_to_system_mul); > > Doesn't this result in a minor drift with scaled clock, because the > guest can be combining two systems that approximate frequency? You mean instead of doing read_l1_tsc? > 1) tsc_shift and tsc_to_system_mul for kvmclock scaling > 2) hardware TSC scaling ratio > > If we are on a 7654321 kHz TSC and TSC-ratio scale to 1234567 kHz and > then tsc_shift+tsc_to_system_mul kvmclock-scale to 1000000 kHz, we > should be using multipliers of > 0.161290204578564186163606151349022336533834941074459772460... and > 0.810000591300431649315104000025920018921613812778083328000..., > to achieve that. Those multipliers cannot be precisely expressed in > what we have (shifts and 64/32 bit multipliers with intermediate values > only up to 128 bits), so performing the scaling will result in slightly > incorrect frequency. > > The result of combining two operations that alter the freqency is quite > unlikely to cancel out and produce the same result as an operation that > uses a different shift+multiplier to scale in one step, so I think that > we aren't getting the same time as the guest with TSC-scaling is seeing. I think you get pretty good precision, since 30 fractional bits are more or less equivalent to nanosecond precision. For example, cutting the two ratios above to 30 fractional bits I get respectively 173184038/2^30 and 869731512/2^30. Multiplying them gives 140279173/2^30 which matches exactly the fixed point representation of 1000000/7654321. Since the TSC scaling frequency has a larger precision (32 or 48 bits), you should get at most 1 ulp error, which is not bad. Paolo > (I'd be happier if we didn't ignore this drift when the whole endeavor > started just to get rid of a drift, but introducing a minor bug is still > improving the situation -- I'm ok with first two changes only.) > > > + return __pvclock_read_cycles(&hv_clock, rdtsc()); > > } > > > > u64 get_kvmclock_ns(struct kvm *kvm) > > -- > > 1.8.3.1 > > >