Date: Wed, 16 Nov 2016 19:10:02 +0100
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, mtosatti@redhat.com
Subject: Re: [PATCH v2] KVM: x86: do not go through vcpu in __get_kvmclock_ns
Message-ID: <20161116181001.GB7678@potion>
References: <1479145881-20828-1-git-send-email-pbonzini@redhat.com>
 <20161116161030.GA7678@potion>
 <1216909500.13168461.1479317268749.JavaMail.zimbra@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1216909500.13168461.1479317268749.JavaMail.zimbra@redhat.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2216
Lines: 48

2016-11-16 12:27-0500, Paolo Bonzini:
>> > +	if (!ka->use_master_clock)
>> > +		return ktime_get_boot_ns() + ka->kvmclock_offset;
>> >  
>> > -	return ns;
>> > +	hv_clock.tsc_timestamp = ka->master_cycle_now;
>> > +	hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
>> > +	kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
>> > +			   &hv_clock.tsc_shift,
>> > +			   &hv_clock.tsc_to_system_mul);
>> 
>> Doesn't this result in a minor drift with scaled clock, because the
>> guest can be combining two systems that approximate frequency?
> 
> You mean instead of doing read_l1_tsc?

Yes, when we avoid the scaling and use host TSC directly.

>>   1) tsc_shift and tsc_to_system_mul for kvmclock scaling
>>   2) hardware TSC scaling ratio
>> 
>> If we are on a 7654321 kHz TSC and TSC-ratio scale to 1234567 kHz and
>> then tsc_shift+tsc_to_system_mul kvmclock-scale to 1000000 kHz, we
>> should be using multipliers of
>>   0.161290204578564186163606151349022336533834941074459772460...  and
>>   0.810000591300431649315104000025920018921613812778083328000...,
>> to achieve that.  Those multipliers cannot be precisely expressed in
>> what we have (shifts and 64/32 bit multipliers with intermediate values
>> only up to 128 bits), so performing the scaling will result in slightly
>> incorrect frequency.
>> 
>> The result of combining two operations that alter the freqency is quite
>> unlikely to cancel out and produce the same result as an operation that
>> uses a different shift+multiplier to scale in one step, so I think that
>> we aren't getting the same time as the guest with TSC-scaling is seeing.
> 
> I think you get pretty good precision, since 30 fractional bits are more
> or less equivalent to nanosecond precision.  For example, cutting the two
> ratios above to 30 fractional bits I get respectively 173184038/2^30
> and 869731512/2^30.  Multiplying them gives 140279173/2^30 which matches
> exactly the fixed point representation of 1000000/7654321.
> 
> Since the TSC scaling frequency has a larger precision (32 or 48 bits),
> you should get at most 1 ulp error, which is not bad.

True, it would take many years to accumulate into a second.

Thanks.