Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2694062rdg; Mon, 16 Oct 2023 11:49:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFMwDXZ6Jzk6hnqu5n+Zh8jNtrbNsA3e2BYKYPZ8PBh/2o9L5pLyHK1FGoHS8+z3h3XWOvY X-Received: by 2002:a05:6359:1a47:b0:134:ea45:53d7 with SMTP id ru7-20020a0563591a4700b00134ea4553d7mr274536rwb.26.1697482177904; Mon, 16 Oct 2023 11:49:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697482177; cv=none; d=google.com; s=arc-20160816; b=eK7a8Z+2ly+sZQ+iEOSklCerzF32j2x5qNnqZZ5tZazBne028TVDwz5QglRR4Zfavw C3OfbkSj9+tbbBEsogHHI3i8082b9ekqzo7DN9Dwae7/CE3ZZPfsc1NZ2ndZrllUZMdO 5B1RmZWHqS4uFxMlwZgr3sM470dgb0J0agQfn+DtklkLkKAgGCSgfjiTa97k6e8yUR2u Ei24UVLC0aQZAJDdcB3SLVQQsNMFm1pcHUhLCEgGfeRLlgSzwveSt6Eh3w/IQlI8LO0t b4rZk3cyh6xmMMZaqDvls0yJyKhhSJghSQF0xefCASuHGwkJK78zoUCEtBo0gHCaNW1v WY0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=OAcCmmrGQB8I/RvBQ4RW2J7O69twzM6FPTQwq9j0qgo=; fh=kEwGWtJQc59vbmlm12aVprPFZ9WpoqRDDmJESEnXeTY=; b=pfOX5ot+FXPxCE2OjcCaaED+IlecGCIf3tQtFzpd9ol1dPvw6sUoZoik9k0rpGcFsJ 2iHTr//rR5X0zVVVb/pY05CNEuohBjZJXWz6YlkIKG7rcJwYnT/5pEYiml+R/ojTHufn aAMaFrCSwR+qS5iRlJqCxQcCMGbBj9dM9IxRz/WG3nPNlx0hAZXYcYJsWIN5xyi2cuDs uHNKmeWoqd3sYSp50uQrtyMOAc4gUUaV+K8qErS78ogcwBnpwywRzIsU2/plbod/M7VW srupA/azQ4qIr0BphaBseK5i9jdgMgJ9+YG8jBZcpRGJhQ5XoCEM0O2fDs42sbFzJVEr JjuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sQUXwWJB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id x67-20020a626346000000b006b7b42fe43asi359763pfb.185.2023.10.16.11.49.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 11:49:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sQUXwWJB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 8094980A3677; Mon, 16 Oct 2023 11:49:34 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232302AbjJPStV (ORCPT + 99 others); Mon, 16 Oct 2023 14:49:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229459AbjJPStU (ORCPT ); Mon, 16 Oct 2023 14:49:20 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2DE2A2 for ; Mon, 16 Oct 2023 11:49:17 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-59b5a586da6so35960947b3.1 for ; Mon, 16 Oct 2023 11:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697482157; x=1698086957; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OAcCmmrGQB8I/RvBQ4RW2J7O69twzM6FPTQwq9j0qgo=; b=sQUXwWJB4uLiO8XNHOFB1l5FoizPEo+PZbBKASXQ6PvNmJgNQRz43U4wwj7Svj/WSO jzSuKhlJhzRgk3AIDZGD9KaSNM/ow5w8TLMXkKrHbmveoKmm8ek8FupGZ56qG1bLx36s Sn1oKWb0WDx/QYEGs+aEsAR3J0n9wf29zLsRM3YaKrnsbcyXMpZMFIVxk0USrJ6Ua3nR y1H53b0VTdZbor9LVRCitechTNkhf3NSCWtw/Uxm0Di4AWfe6VKXJJ8o3Ow/QywmTX4s xcr956MmQSPvfFTdXeD5NIXek4APw04zWSlLh0gVBRkdYn1tbhjHJLYPmRehOMWux/bh PXsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697482157; x=1698086957; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OAcCmmrGQB8I/RvBQ4RW2J7O69twzM6FPTQwq9j0qgo=; b=irXL/r7E0/9vBa3rQEA+iBrw/oWwCud9y8zJrIqQEdz9GK5OwPUaGAkXh1qRZIKlNE vmbF5UQJENr/24Cx5oZg0qeJyh0pb1kNUkhGhmPlSuDc6e/mJ2OF7Ln8XVFJ6WMeSOuI suXVLSn7Koe3NdKtz1SBW5HKqE0RzhB12cUCM+iEABDsTJTXSkJdr9BwT3CUwwfxmFt4 C7C5C7frLOp51BlG5xhwtTHNTxEkM0iFMBXKx9hTxbZ386hgYqv5rQMvmVMAHL92HgWP 0sxHzKAvh+9d1h3KXSZ24JBEpN5fytfwmmhYoJyqLmUaMxHOrqLylbQCVhMEErMCY+ZL puoA== X-Gm-Message-State: AOJu0Yzdw7yhlFpk1jMDenLS5oYg3PV00VnTWbRNGaOaSDvegu3oMebi hZAjlNirZy/iQixZ5xNWJGEkM1yKR24= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:c20e:0:b0:595:8166:7be with SMTP id z14-20020a81c20e000000b00595816607bemr12527ywc.0.1697482157142; Mon, 16 Oct 2023 11:49:17 -0700 (PDT) Date: Mon, 16 Oct 2023 11:49:15 -0700 In-Reply-To: <03afed7eb3c1e5f4b2b8ecfd8616ae5c6f1819e9.camel@infradead.org> Mime-Version: 1.0 References: <34057852-f6c0-d6d5-261f-bbb5fa056425@oracle.com> <8f3493ca4c0e726d5c3876bb7dd2cfc432d9deaa.camel@infradead.org> <993cc7f9-a134-8086-3410-b915fe5db7a5@oracle.com> <03afed7eb3c1e5f4b2b8ecfd8616ae5c6f1819e9.camel@infradead.org> Message-ID: Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically From: Sean Christopherson To: David Woodhouse Cc: Dongli Zhang , Joe Jin , x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 16 Oct 2023 11:49:34 -0700 (PDT) On Mon, Oct 16, 2023, David Woodhouse wrote: > On Mon, 2023-10-16 at 08:47 -0700, Dongli Zhang wrote: > > Suppose we are discussing a non-permanenet solution, I would suggest: > > > > 1. Document something to accept that kvm-clock (or pvclock on KVM, including Xen > > on KVM) is not good enough in some cases, e.g., vCPU hotplug. > > I still don't understand the vCPU hotplug case. > > In the case where the TSC is actually sane, why would we need to reset > the masterclock on vCPU hotplug? > > The new vCPU gets its TSC synchronised to the others, and its kvmclock > parameters (mul/shift/offset based on the guest TSC) can be *precisely* > the same as the other vCPUs too, can't they? Why reset anything? Aha! I think I finally figured out why KVM behaves the way it does. The unnecessary masterclock updates effectively came from: commit 7f187922ddf6b67f2999a76dcb71663097b75497 Author: Marcelo Tosatti Date: Tue Nov 4 21:30:44 2014 -0200 KVM: x86: update masterclock values on TSC writes When the guest writes to the TSC, the masterclock TSC copy must be updated as well along with the TSC_OFFSET update, otherwise a negative tsc_timestamp is calculated at kvm_guest_time_update. Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)" becomes redundant, so remove the do_request boolean and collapse everything into a single condition. Before that, KVM only re-synced the masterclock if it was enabled or disabled, i.e. KVM behaved as we want it to behave. Note, at the time of the above commit, VMX synchronized TSC on *guest* writes to MSR_IA32_TSC: case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; That got changed by commit 0c899c25d754 ("KVM: x86: do not attempt TSC synchronization on guest writes"), but I don't think the guest angle is actually relevant to the fix. AFAICT, a write from the host would suffer the same problem. But knowing that KVM synced on guest writes is crucial to understanding the changelog. In kvm_write_tsc(), except for KVM's wonderful "less than 1 second" hack, KVM snapshotted the vCPU's current TSC *and* the current time in nanoseconds, where kvm->arch.cur_tsc_nsec is the current host kernel time in nanoseconds. ns = get_kernel_ns(); ... if (usdiff < USEC_PER_SEC && vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) { ... } else { /* * We split periods of matched TSC writes into generations. * For each generation, we track the original measured * nanosecond time, offset, and write, so if TSCs are in * sync, we can match exact offset, and if not, we can match * exact software computation in compute_guest_tsc() * * These values are tracked in kvm->arch.cur_xxx variables. */ kvm->arch.cur_tsc_generation++; kvm->arch.cur_tsc_nsec = ns; kvm->arch.cur_tsc_write = data; kvm->arch.cur_tsc_offset = offset; matched = false; pr_debug("kvm: new tsc generation %llu, clock %llu\n", kvm->arch.cur_tsc_generation, data); } ... /* Keep track of which generation this VCPU has synchronized to */ vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; Note that the above sets matched to false! But because kvm_track_tsc_matching() looks for matched+1, i.e. doesn't require the first vCPU to match itself, KVM would immediately compute vcpus_matched as true for VMs with a single vCPU. As a result, KVM would skip the masterlock update, even though a new TSC generation was created. vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == atomic_read(&vcpu->kvm->online_vcpus)); if (vcpus_matched && gtod->clock.vclock_mode == VCLOCK_TSC) if (!ka->use_master_clock) do_request = 1; if (!vcpus_matched && ka->use_master_clock) do_request = 1; if (do_request) kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); On hardware without TSC scaling support, vcpu->tsc_catchup is set to true if the guest TSC frequency is faster than the host TSC frequency, even if the TSC is otherwise stable. And for that mode, kvm_guest_time_update(), by way of compute_guest_tsc(), uses vcpu->arch.this_tsc_nsec, a.k.a. the kernel time at the last TSC write, to compute the guest TSC relative to kernel time: static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns) { u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.this_tsc_nsec, vcpu->arch.virtual_tsc_mult, vcpu->arch.virtual_tsc_shift); tsc += vcpu->arch.this_tsc_write; return tsc; } Except the @kernel_ns passed to compute_guest_tsc() isn't the current kernel time, it's the masterclock snapshot! spin_lock(&ka->pvclock_gtod_sync_lock); use_master_clock = ka->use_master_clock; if (use_master_clock) { host_tsc = ka->master_cycle_now; kernel_ns = ka->master_kernel_ns; } spin_unlock(&ka->pvclock_gtod_sync_lock); if (vcpu->tsc_catchup) { u64 tsc = compute_guest_tsc(v, kernel_ns); if (tsc > tsc_timestamp) { adjust_tsc_offset_guest(v, tsc - tsc_timestamp); tsc_timestamp = tsc; } } And so the "kernel_ns-vcpu->arch.this_tsc_nsec" is *guaranteed* to generate a negative value, because this_tsc_nsec was captured after ka->master_kernel_ns. Forcing a masterclock update essentially fudged around that problem, but in a heavy handed way that introduced undesirable side effects, i.e. unnecessarily forces a masterclock update when a new vCPU joins the party via hotplug. Compile tested only, but the below should fix the vCPU hotplug case. Then someone (not me) just needs to figure out why kvm_xen_shared_info_init() forces a masterclock update. I still think we should clean up the periodic sync code, but I don't think we need to periodically sync the masterclock. --- arch/x86/kvm/x86.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c54e1133e0d3..f0a607b6fc31 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2510,26 +2510,29 @@ static inline int gtod_is_based_on_tsc(int mode) } #endif -static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu) +static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, bool new_generation) { #ifdef CONFIG_X86_64 - bool vcpus_matched; struct kvm_arch *ka = &vcpu->kvm->arch; struct pvclock_gtod_data *gtod = &pvclock_gtod_data; - vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == - atomic_read(&vcpu->kvm->online_vcpus)); + /* + * To use the masterclock, the host clocksource must be based on TSC + * and all vCPUs must have matching TSCs. Note, the count for matching + * vCPUs doesn't include the reference vCPU, hence "+1". + */ + bool use_master_clock = (ka->nr_vcpus_matched_tsc + 1 == + atomic_read(&vcpu->kvm->online_vcpus)) && + gtod_is_based_on_tsc(gtod->clock.vclock_mode); /* - * Once the masterclock is enabled, always perform request in - * order to update it. - * - * In order to enable masterclock, the host clocksource must be TSC - * and the vcpus need to have matched TSCs. When that happens, - * perform request to enable masterclock. + * Request a masterclock update if the masterclock needs to be toggled + * on/off, or when starting a new generation and the masterclock is + * enabled (compute_guest_tsc() requires the masterclock snaphot to be + * taken _after_ the new generation is created). */ - if (ka->use_master_clock || - (gtod_is_based_on_tsc(gtod->clock.vclock_mode) && vcpus_matched)) + if ((ka->use_master_clock && new_generation) || + (ka->use_master_clock != use_master_clock)) kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc, @@ -2706,7 +2709,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; - kvm_track_tsc_matching(vcpu); + kvm_track_tsc_matching(vcpu, !matched); } static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value) base-commit: dfdc8b7884b50e3bfa635292973b530a97689f12 --