Date: Wed, 16 Mar 2016 23:06:00 +0100
From: Radim Krcmar <rkrcmar@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org, Marcelo Tosatti <mtosatti@redhat.com>,
        Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org, Alexander Graf <agraf@suse.de>
Subject: Re: [PATCH 1/5] x86/kvm: On KVM re-enable (e.g. after suspend),
 update clocks
Message-ID: <20160316220502.GA7040@potion.brq.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@kernel.org>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2560
Lines: 63

2015-12-09 15:12-0800, Andy Lutomirski:
> This gets rid of the "did TSC go backwards" logic and just updates
> all clocks.  It should work better (no more disabling of fast
> timing) and more reliably (all of the clocks are actually updated).
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -7369,88 +7366,22 @@ int kvm_arch_hardware_enable(void)
>  	list_for_each_entry(kvm, &vm_list, vm_list) {
>  		kvm_for_each_vcpu(i, vcpu, kvm) {
> +			if (vcpu->cpu == smp_processor_id()) {

(vmm_exclusive sets vcpu->cpu to -1, so KVM_REQ_MASTERCLOCK_UPDATE might
 not run, but vmm_exclusive probably doesn't work anyway.)

>  				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> +				kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE,
> +						vcpu);
>  			}

(Requesting KVM_REQ_MASTERCLOCK_UPDATE once per VM is enough.)

> -	if (backwards_tsc) {
> -		u64 delta_cyc = max_tsc - local_tsc;
> -		backwards_tsc_observed = true;
> -		list_for_each_entry(kvm, &vm_list, vm_list) {
> -			kvm_for_each_vcpu(i, vcpu, kvm) {
> -				vcpu->arch.tsc_offset_adjustment += delta_cyc;
> -				vcpu->arch.last_host_tsc = local_tsc;

tsc_offset_adjustment was set for

  	/* Apply any externally detected TSC adjustments (due to suspend) */
  	if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
  		adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
  		vcpu->arch.tsc_offset_adjustment = 0;
  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
  	}

Guest TSC is going to jump backward with this patch, which would make
the guest think that a lot of cycles passed.  This has no bearing on
guest timekeeping, because the guest shouldn't be using raw TSC.
If we wanted to do something though, there are at least two options:
1) Fake that TSC continued at roughly its specified rate:  compute how
   many cycles could have elapsed while the CPU was suspended (using
   host time before/after suspend and guest TSC frequency) and adjust
   guest TSC.
2) Resume guest TSC at its last cycle before suspend.
   (Roughly what KVM does now.)

What are your opinions on TSC faking?

Thanks.


---
Btw. I'll be spending some days to decipher kvmclock, so I'd also fix
the masterclock+suspend issue, if you don't mind ... So far, I don't
even see a reason to update kvmclock on kvm_arch_hardware_enable().
Suspend is a condition that we want to handle, so kvm_resume would be a
better place, but we handle suspend only because TSC and timekeeping has
changed, so I think that the right place is in their event notifiers.