Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966059AbcCPWGI (ORCPT ); Wed, 16 Mar 2016 18:06:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33118 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752179AbcCPWGG (ORCPT ); Wed, 16 Mar 2016 18:06:06 -0400 Date: Wed, 16 Mar 2016 23:06:00 +0100 From: Radim Krcmar To: Andy Lutomirski Cc: x86@kernel.org, Marcelo Tosatti , Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Alexander Graf Subject: Re: [PATCH 1/5] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks Message-ID: <20160316220502.GA7040@potion.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@kernel.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 16 Mar 2016 22:06:04 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2560 Lines: 63 2015-12-09 15:12-0800, Andy Lutomirski: > This gets rid of the "did TSC go backwards" logic and just updates > all clocks. It should work better (no more disabling of fast > timing) and more reliably (all of the clocks are actually updated). > > Signed-off-by: Andy Lutomirski > --- > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > @@ -7369,88 +7366,22 @@ int kvm_arch_hardware_enable(void) > list_for_each_entry(kvm, &vm_list, vm_list) { > kvm_for_each_vcpu(i, vcpu, kvm) { > + if (vcpu->cpu == smp_processor_id()) { (vmm_exclusive sets vcpu->cpu to -1, so KVM_REQ_MASTERCLOCK_UPDATE might not run, but vmm_exclusive probably doesn't work anyway.) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, > + vcpu); > } (Requesting KVM_REQ_MASTERCLOCK_UPDATE once per VM is enough.) > - if (backwards_tsc) { > - u64 delta_cyc = max_tsc - local_tsc; > - backwards_tsc_observed = true; > - list_for_each_entry(kvm, &vm_list, vm_list) { > - kvm_for_each_vcpu(i, vcpu, kvm) { > - vcpu->arch.tsc_offset_adjustment += delta_cyc; > - vcpu->arch.last_host_tsc = local_tsc; tsc_offset_adjustment was set for /* Apply any externally detected TSC adjustments (due to suspend) */ if (unlikely(vcpu->arch.tsc_offset_adjustment)) { adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment); vcpu->arch.tsc_offset_adjustment = 0; kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); } Guest TSC is going to jump backward with this patch, which would make the guest think that a lot of cycles passed. This has no bearing on guest timekeeping, because the guest shouldn't be using raw TSC. If we wanted to do something though, there are at least two options: 1) Fake that TSC continued at roughly its specified rate: compute how many cycles could have elapsed while the CPU was suspended (using host time before/after suspend and guest TSC frequency) and adjust guest TSC. 2) Resume guest TSC at its last cycle before suspend. (Roughly what KVM does now.) What are your opinions on TSC faking? Thanks. --- Btw. I'll be spending some days to decipher kvmclock, so I'd also fix the masterclock+suspend issue, if you don't mind ... So far, I don't even see a reason to update kvmclock on kvm_arch_hardware_enable(). Suspend is a condition that we want to handle, so kvm_resume would be a better place, but we handle suspend only because TSC and timekeeping has changed, so I think that the right place is in their event notifiers.