Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932192AbbLNKSk (ORCPT ); Mon, 14 Dec 2015 05:18:40 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35102 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932140AbbLNKSi (ORCPT ); Mon, 14 Dec 2015 05:18:38 -0500 Subject: Re: [tip:x86/asm] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks To: tglx@linutronix.de, bp@alien8.de, peterz@infradead.org, hpa@zytor.com, luto@amacapital.net, torvalds@linux-foundation.org, dvlasenk@redhat.com, linux-kernel@vger.kernel.org, mingo@kernel.org, brgerst@gmail.com, luto@kernel.org, linux-tip-commits@vger.kernel.org References: <861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@kernel.org> From: Paolo Bonzini X-Enigmail-Draft-Status: N1110 Message-ID: <566E9776.2030808@redhat.com> Date: Mon, 14 Dec 2015 11:18:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6494 Lines: 159 On 14/12/2015 09:16, tip-bot for Andy Lutomirski wrote: > Commit-ID: 677a73a9aa5433ea728200c26a7b3506d5eaa92b > Gitweb: http://git.kernel.org/tip/677a73a9aa5433ea728200c26a7b3506d5eaa92b > Author: Andy Lutomirski > AuthorDate: Thu, 10 Dec 2015 19:20:18 -0800 > Committer: Ingo Molnar > CommitDate: Fri, 11 Dec 2015 08:56:02 +0100 > > x86/kvm: On KVM re-enable (e.g. after suspend), update clocks > > This gets rid of the "did TSC go backwards" logic and just > updates all clocks. It should work better (no more disabling of > fast timing) and more reliably (all of the clocks are actually > updated). > > Signed-off-by: Andy Lutomirski > Reviewed-by: Paolo Bonzini > Cc: Andy Lutomirski > Cc: Borislav Petkov > Cc: Brian Gerst > Cc: Denys Vlasenko > Cc: H. Peter Anvin > Cc: Linus Torvalds > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: linux-mm@kvack.org > Link: http://lkml.kernel.org/r/861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@kernel.org > Signed-off-by: Ingo Molnar > --- > arch/x86/kvm/x86.c | 75 +++--------------------------------------------------- > 1 file changed, 3 insertions(+), 72 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 00462bd..6e32e87 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -123,8 +123,6 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); > unsigned int __read_mostly lapic_timer_advance_ns = 0; > module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); > > -static bool __read_mostly backwards_tsc_observed = false; > - > #define KVM_NR_SHARED_MSRS 16 > > struct kvm_shared_msrs_global { > @@ -1671,7 +1669,6 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) > &ka->master_cycle_now); > > ka->use_master_clock = host_tsc_clocksource && vcpus_matched > - && !backwards_tsc_observed > && !ka->boot_vcpu_runs_old_kvmclock; > > if (ka->use_master_clock) > @@ -7366,88 +7363,22 @@ int kvm_arch_hardware_enable(void) > struct kvm_vcpu *vcpu; > int i; > int ret; > - u64 local_tsc; > - u64 max_tsc = 0; > - bool stable, backwards_tsc = false; > > kvm_shared_msr_cpu_online(); > ret = kvm_x86_ops->hardware_enable(); > if (ret != 0) > return ret; > > - local_tsc = rdtsc(); > - stable = !check_tsc_unstable(); > list_for_each_entry(kvm, &vm_list, vm_list) { > kvm_for_each_vcpu(i, vcpu, kvm) { > - if (!stable && vcpu->cpu == smp_processor_id()) > + if (vcpu->cpu == smp_processor_id()) { > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > - if (stable && vcpu->arch.last_host_tsc > local_tsc) { > - backwards_tsc = true; > - if (vcpu->arch.last_host_tsc > max_tsc) > - max_tsc = vcpu->arch.last_host_tsc; > + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, > + vcpu); > } > } > } > > - /* > - * Sometimes, even reliable TSCs go backwards. This happens on > - * platforms that reset TSC during suspend or hibernate actions, but > - * maintain synchronization. We must compensate. Fortunately, we can > - * detect that condition here, which happens early in CPU bringup, > - * before any KVM threads can be running. Unfortunately, we can't > - * bring the TSCs fully up to date with real time, as we aren't yet far > - * enough into CPU bringup that we know how much real time has actually > - * elapsed; our helper function, get_kernel_ns() will be using boot > - * variables that haven't been updated yet. > - * > - * So we simply find the maximum observed TSC above, then record the > - * adjustment to TSC in each VCPU. When the VCPU later gets loaded, > - * the adjustment will be applied. Note that we accumulate > - * adjustments, in case multiple suspend cycles happen before some VCPU > - * gets a chance to run again. In the event that no KVM threads get a > - * chance to run, we will miss the entire elapsed period, as we'll have > - * reset last_host_tsc, so VCPUs will not have the TSC adjusted and may > - * loose cycle time. This isn't too big a deal, since the loss will be > - * uniform across all VCPUs (not to mention the scenario is extremely > - * unlikely). It is possible that a second hibernate recovery happens > - * much faster than a first, causing the observed TSC here to be > - * smaller; this would require additional padding adjustment, which is > - * why we set last_host_tsc to the local tsc observed here. > - * > - * N.B. - this code below runs only on platforms with reliable TSC, > - * as that is the only way backwards_tsc is set above. Also note > - * that this runs for ALL vcpus, which is not a bug; all VCPUs should > - * have the same delta_cyc adjustment applied if backwards_tsc > - * is detected. Note further, this adjustment is only done once, > - * as we reset last_host_tsc on all VCPUs to stop this from being > - * called multiple times (one for each physical CPU bringup). > - * > - * Platforms with unreliable TSCs don't have to deal with this, they > - * will be compensated by the logic in vcpu_load, which sets the TSC to > - * catchup mode. This will catchup all VCPUs to real time, but cannot > - * guarantee that they stay in perfect synchronization. > - */ > - if (backwards_tsc) { > - u64 delta_cyc = max_tsc - local_tsc; > - backwards_tsc_observed = true; > - list_for_each_entry(kvm, &vm_list, vm_list) { > - kvm_for_each_vcpu(i, vcpu, kvm) { > - vcpu->arch.tsc_offset_adjustment += delta_cyc; > - vcpu->arch.last_host_tsc = local_tsc; > - kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); > - } > - > - /* > - * We have to disable TSC offset matching.. if you were > - * booting a VM while issuing an S4 host suspend.... > - * you may have some problem. Solving this issue is > - * left as an exercise to the reader. > - */ > - kvm->arch.last_tsc_nsec = 0; > - kvm->arch.last_tsc_write = 0; > - } > - > - } > return 0; > } > > NACK. Please remove this patch from the tip tree. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/