Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp881980pxt; Fri, 6 Aug 2021 16:51:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxRBuSj+o1ZSLte76mTqIoB361JTxc5ohutDY54lZ1UtpteO70pSi2G56nNd9LSdEiuS4V+ X-Received: by 2002:a05:6602:1647:: with SMTP id y7mr292141iow.30.1628293908656; Fri, 06 Aug 2021 16:51:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628293908; cv=none; d=google.com; s=arc-20160816; b=RuMNdzCqoscPyQs94cXMAgKc+EByMwyk/lBMm2ADgwUsBz3QNv2A/JDqnno3ZLkxWM /WgZxKnvkNEODru+cKh0nCQZFHoJbqOwACFtwttiVebDDUhO4qQExdMz6ALdzCnOUqcb kzVkwI53WFeqllU7LLd2GV/oIXyO4x66HWp9n4hzogKWECWp7KoDC1x2RgSiXxlZOSHq 3XW8zLfcTBlO+TFMBunYGM+RAQGlN6BOu54Wrt9xauz6LaGFb4nqe5Ql4y+8pOfMq1eH KijdfsSHjl3bRQq/k817Rut47araaCS9PVqd/BkQOGmBR3icsUQAmamzOLOaRK4lS/v8 cppw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=lk/rklBGJbLUUQZyoq+gs1+H5iN5EBlS7CuTTnKKDKA=; b=ux31C2ifF/7rlFe6lmM0MI4sHHTk3EiXxWNwfd4Z/Fzxi+JfLxCzdUtDjTUxTXh2uP gFtCypBOYOCfNfuEuyKpXdPw4Cpn8Z62naaItE9f9SXJTnEQ73OilWx3Gy43t1mzqy1j ngJCzXhJBYevZk5E/xZYpjk9vyr47gknj8LsrL5KF0YlrABBPB6QJ+PoDGiNk07xsb7s mIB1cXoeUp0/1DfOE5ESspSqVt7DxhAXhcYHYt0n1PXC4kZib6T2wc1dkcsRXYd8vrWs PLqFAANUs6jsFIrQqX/hReQMx2VwwBz0eH4edqSJrYG5ay6JT6taVmp2JUYJvaDoovrt NBdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Z3015pl6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j6si12519172jat.30.2021.08.06.16.51.37; Fri, 06 Aug 2021 16:51:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Z3015pl6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231543AbhHFQUa (ORCPT + 99 others); Fri, 6 Aug 2021 12:20:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231453AbhHFQUa (ORCPT ); Fri, 6 Aug 2021 12:20:30 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 962C6C061798 for ; Fri, 6 Aug 2021 09:20:13 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id u16so7766278ple.2 for ; Fri, 06 Aug 2021 09:20:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=lk/rklBGJbLUUQZyoq+gs1+H5iN5EBlS7CuTTnKKDKA=; b=Z3015pl6wjzFCtaiptoQofYXucCx2JNzuIpRkBDN/T/0deeTpz66FIaPsdm9qTiMAh EHX6DAN50du5HC2tIEmUBFCUw/i18du9l+Sru0/uh/TUezaeTCcs1+Y7+OSUxB1gnij8 kINmzKoOE/QX1hReUhFX3A7UT5M2Zh0CZHY5fHNhndH2S1Rmqm3501DTrWjCYe9KP3dC TWfK5RaSSpWFeaePZYkf5ZawD8wQ6eb0p7XTfViPYX9qgdIwfOgRFaSAq7FcOnfZHyIM aPQvG3xBb8TG/4euCYq2rj1iOi+DrmEjSGVnkFc1uZOczxKQOErzL4/N6HwybGHw85UG yl1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=lk/rklBGJbLUUQZyoq+gs1+H5iN5EBlS7CuTTnKKDKA=; b=nWtRkLvAYDSMAfv/6r/cPSZfH6UBrZim3FVXGWRd1SIBMhaNyqMDOQUabLXwgnigqO NC0UVHcktlwiMy+7peoSrzo2fxulAnexNYClP9IEszYBdA1Z4HXanAxhs8ZwYfYaqAn2 I56jzQ4uqOMNlR6vQIt8C0IEEe14geNBjzDyZzPApb3RpABAGtnMnr3Embox80T2pot6 qlVJr4D2KcOiUWjHL7/5872Ukh/WpfTfEQ4ojkllBNsBa3oVYCjD4SS6bSMLBYYG3q9q yBsF+Ebnk9uaDnep5y5EZhfIcebP4ew3LnI3xzIUaP/46ZTAsnNkwahYT7hfduGuYBEu u/Sg== X-Gm-Message-State: AOAM533v4xdLnlIOeDH5iVn9f3OgA1bSFtl3BHyS1duAyDFMh1rWOa3p aE01KBgDUenKbadzZQXfEhSqNg== X-Received: by 2002:a17:902:f295:b029:12c:a253:7087 with SMTP id k21-20020a170902f295b029012ca2537087mr9252966plc.6.1628266812776; Fri, 06 Aug 2021 09:20:12 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id 6sm11553582pfg.108.2021.08.06.09.20.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Aug 2021 09:20:12 -0700 (PDT) Date: Fri, 6 Aug 2021 16:20:08 +0000 From: Sean Christopherson To: Suleiman Souhlal Cc: Paolo Bonzini , ssouhlal@freebsd.org, hikalium@chromium.org, senozhatsky@chromium.org, Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] kvm,x86: Use the refined tsc rate for the guest tsc. Message-ID: References: <20210803075914.3070477-1-suleiman@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210803075914.3070477-1-suleiman@google.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org "KVM: x86:" is the preferred scope. And for whatever reason, punctuation at the end of the shortlog is almost always omitted. On Tue, Aug 03, 2021, Suleiman Souhlal wrote: > Prior to this change, In the changelog proper, please state what change is being made before launching into the effects of the change. Oftentimes that does mean restating the shortlog, but it's very helpful to reviewers/readers to provide a single cohesive/coherent explanation. > the initial tsc rate used by kvm would be the unrefined rate, instead of the > refined rate that is derived later at boot and used for timekeeping. This can > cause time to advance at different rates between the host and the guest. This needs a lot more explanation, with a clear statement of the direct effects of the change and with explicit references to variables and probably functions. Normally I prefer abstraction over explicit code references, but in this case there are too many ambiguous terms to understand the intended change without a lot of staring. E.g. at first blush, it's not obvious that "boot" refers to the host kernel boot, especially for those of us that load KVM as a module. And the statement "the initial tsc rate used by kvm would be the unrefined rate" will not always hold true, e.g. when KVM is loaded long after boot. That also confused me. IIUC, this "fixes" a race where KVM is initialized before the second call to tsc_refine_calibration_work() completes. Fixes in quotes because it doesn't actually fix the race, it just papers over the problem to get the desired behavior. If the race can't be truly fixed, the changelog should explain why it can't be fixed, otherwise fudging our way around the race is not justifiable. Ideally, we would find a way to fix the race, e.g. by ensuring KVM can't load or by stalling KVM initialization until refinement completes (or fails). tsc_khz is consumed by KVM in multiple paths, and initializing KVM before tsc_khz calibration is fully refined means some part of KVM will use the wrong tsc_khz, e.g. the VMX preemption timer. Due to sanity checks in tsc_refine_calibration_work(), the delta won't be more than 1%, but it's still far from ideal. > Signed-off-by: Suleiman Souhlal > --- > arch/x86/kvm/x86.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4116567f3d44..1e59bb326c10 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2199,6 +2199,7 @@ static atomic_t kvm_guest_has_master_clock = ATOMIC_INIT(0); > #endif > > static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz); > +static DEFINE_PER_CPU(bool, cpu_tsc_khz_changed); > static unsigned long max_tsc_khz; > > static u32 adjust_tsc_khz(u32 khz, s32 ppm) > @@ -2906,6 +2907,14 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); > return 1; > } > + /* > + * Use the refined tsc_khz instead of the tsc_khz at boot (which was > + * not refined yet when we got it), As above, this does not hold true in all cases. And for anyone that isn't familiar with tsc_refine_calibration_work(), the "refined tsc_khz instead of the tsc_khz" blurb is borderline nonsensical. > + * If the frequency does change, it does not get refined any further, > + * so it is safe to use the one gotten from the notifiers. > + */ > + if (!__this_cpu_read(cpu_tsc_khz_changed)) > + tgt_tsc_khz = tsc_khz; > if (!use_master_clock) { > host_tsc = rdtsc(); > kernel_ns = get_kvmclock_base_ns(); > @@ -8086,6 +8095,8 @@ static void tsc_khz_changed(void *data) > khz = freq->new; > else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) > khz = cpufreq_quick_get(raw_smp_processor_id()); > + if (khz) > + __this_cpu_write(cpu_tsc_khz_changed, true); On a system with a constant TSC and KVM loaded after boot, cpu_tsc_khz_changed will never be true. Ditto for the case where refinement fails. That may not be a functional issue, but it's confusing. This also fails to gate other readers of kvm_guest_time_update. In particular, KVM_GET_CLOCK -> get_kvmclock_ns() will use the "wrong" frequency when userspace is retrieving guest TSC. > if (!khz) > khz = tsc_khz; > __this_cpu_write(cpu_tsc_khz, khz); > -- > 2.32.0.554.ge1b32706d8-goog >