Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp4065940rdb; Thu, 14 Sep 2023 10:39:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHfP06Cq9mRt4zEcHfyUwCd1cHzR93InL3lw8RMKXXIK+0PrvkBYCysI+YR2FHj8+FZ7aod X-Received: by 2002:a05:6358:110b:b0:142:e748:8f0a with SMTP id f11-20020a056358110b00b00142e7488f0amr4206314rwi.1.1694713198479; Thu, 14 Sep 2023 10:39:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694713198; cv=none; d=google.com; s=arc-20160816; b=F7D3ybOHeUXCD0tjIgZ1QoEb4+hond8gvXT6p8ADQiJ4vp4Oc3Dgt9tAmHXwspRvQO XATg4oS/n3LiNzaRr9BQV8yPwJwiFg/IN+kBaExDbrA2FP3q3xfxdC09VxkppZwE9XEG mU5yq2dJ1J4FJNwcqTde9Ik1kJbSK+T/ApyYmt2nalx8ubmeiWOmfqOWo8dgZyuFVlMo rafOpahB8n8VPH8eWP2henrUNw4y5nM+kZDUgieYSMolLCz021Kl6geFcgjE2DsWXeCD +Qxs1jPyNoLWKyfg+Q69L/vSH0MtMMfWs50E2miP0YicW6EwRwYx3kwLeDhfvLWKgvVx 0oGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:references:in-reply-to:user-agent:subject:cc:to:from :date:dkim-signature; bh=BYNLXBo0bL67GPqYQl9HGYIz6cphXRkKLo87ZYOM0ns=; fh=bYjdTrPTPcc11cEIBniN5VQ02JkotIPlDG7oqwAXzvU=; b=Sctwj6rpCO37QgghxFCw6jX9ye7oJHkiIlxkx4mxIn/3gD4innipfJywsgRLPHaIlo av9XGbpc8tmQFQYwE3QSnHk0/p9YFUNKpntr5jYXJdyyKg01nDQPyMSztJKr584JkJkg NWT8uWb8BwmnS+Ecb8dRQ9bH5aij6HA1aSkxZVZvH/MrsJ4iAqSWfpyuYa8QBRoBZ8PW MNZbblfCB4ZV6qbGEzQKMpw+SYxsZqzkQVCdmDUrBspwSW1n44VAfMKyoDgvzM0i4NBD WY0HBkSksrpzYrdsAi/YFPy4fMRisnSYz6MESmajmV3iCqzySIuTZykKBGxHkq4I4dWD iaZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=o+iKtUwm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id d11-20020a63360b000000b00573f89ac5a1si1805311pga.102.2023.09.14.10.39.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 10:39:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=o+iKtUwm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 3503C81BDDFB; Wed, 13 Sep 2023 02:30:15 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238527AbjIMJaR (ORCPT + 99 others); Wed, 13 Sep 2023 05:30:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229712AbjIMJaQ (ORCPT ); Wed, 13 Sep 2023 05:30:16 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49A0F10DD; Wed, 13 Sep 2023 02:30:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:References:In-Reply-To:Subject:CC:To:From:Date:Sender :Reply-To:Content-ID:Content-Description; bh=BYNLXBo0bL67GPqYQl9HGYIz6cphXRkKLo87ZYOM0ns=; b=o+iKtUwmbYyLhUvUBpleugkAp7 kOmB4pyjzNZKP88yrfQ3243OyFHKsZgnYr+3vhW3rAqNnQx6lcCi7Jy1ZTt631WiPcEr0Aoqj9AVs PXDISGpMnbPnq2ntOnMGbaRrfEqeIB9sk0Tz1Z83TvnQgQsprQuzmRTn54TsCuPlHT0uP7GWXQQ1P uSVnz1pc1LIkOGiQTJzunrK1kaViDH9P2InUKMNW5jsKpy+FY+TBRv7V+V+GjqqMnuvWOxMWVDg5M l0i6coOwFfbh3AQS/4upLtGog+jp8D02+BWC8ml1Y4Do5gaS/iYnAEiTfQzg7fvHQTl6q3Bbo6hIA 4kiIqZkQ==; Received: from [89.27.170.32] (helo=[127.0.0.1]) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qgMCF-00D0hd-Eq; Wed, 13 Sep 2023 09:30:07 +0000 Date: Wed, 13 Sep 2023 11:30:06 +0200 From: David Woodhouse To: Like Xu , Sean Christopherson CC: Paolo Bonzini , Oliver Upton , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: =?US-ASCII?Q?Re=3A_=5BPATCH_v5=5D_KVM=3A_x86/tsc=3A_Don=27t_sync_T?= =?US-ASCII?Q?SC_on_the_first_write_in_state_restoration?= User-Agent: K-9 Mail for Android In-Reply-To: <90194cd0-61d8-18b9-980a-b46f903409b4@gmail.com> References: <20230913072113.78885-1-likexu@tencent.com> <90194cd0-61d8-18b9-980a-b46f903409b4@gmail.com> Message-ID: <461B7217-7AA7-479E-9060-772E243CB03D@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 13 Sep 2023 02:30:15 -0700 (PDT) On 13 September 2023 11:17:49 CEST, Like Xu wrote: >> I think you also need to set kvm->arch=2Euser_set_tsc() in >> kvm_arch_tsc_set_attr(), don't you? > >How about: > >diff --git a/arch/x86/kvm/x86=2Ec b/arch/x86/kvm/x86=2Ec >index c55cc60769db=2E=2E374965f66137 100644 >--- a/arch/x86/kvm/x86=2Ec >+++ b/arch/x86/kvm/x86=2Ec >@@ -5545,6 +5545,7 @@ static int kvm_arch_tsc_set_attr(struct kvm_vcpu *v= cpu, > tsc =3D kvm_scale_tsc(rdtsc(), vcpu->arch=2El1_tsc_scaling_ratio) + of= fset; > ns =3D get_kvmclock_base_ns(); > >+ kvm->arch=2Euser_set_tsc =3D true; > __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched); > raw_spin_unlock_irqrestore(&kvm->arch=2Etsc_write_lock, flags); Yep, that looks good=2E >> This comment isn't quite right; it wants to use some excerpt of the >> commit message I've suggested above=2E > >How about: > >@@ -2735,20 +2735,34 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *= vcpu, u64 data) > * kvm_clock stable after CPU hotplug > */ > synchronizing =3D true; >- } else { >+ } else if (kvm->arch=2Euser_set_tsc) { > u64 tsc_exp =3D kvm->arch=2Elast_tsc_write + > nsec_to_cycles(vcpu, elapsed); > u64 tsc_hz =3D vcpu->arch=2Evirtual_tsc_khz * 1000LL; > /* >- * Special case: TSC write with a small delta (1 second) >- * of virtual cycle time against real time is >- * interpreted as an attempt to synchronize the CPU=2E >+ * Here lies UAPI baggage: user-initiated TSC write with >+ * a small delta (1 second) of virtual cycle time >+ * against real time is interpreted as an attempt to >+ * synchronize the CPU=2E Much better, thanks=2E But I don't much like "an attempt to synchronize th= e CPU"=2E=20 In my response to Sean I objected to that classification=2E Userspace is j= ust *setting* the TSC=2E There is no dedicated intent to "synchronize" it= =2E It just sets it, and the value just *might* happen to be in sync with a= nother vCPU=2E=20 It's just that our API is so fundamentally broken that it *can't* be in sy= nc, so we "help" it a bit if it looks close=2E So maybe=2E=2E=2E Here lies UAPI baggage: when a user-initiated TSC write has a small delta = (1 second) of virtual cycle time against the previously set vCPU, we assume= that they were intended to be in sync and the delta was only due to the ra= cy nature of the legacy API=2E >+ * This trick falls down when restoring a guest which genuinely >+ * has been running for less time than the 1 second of imprecision >+ * which we allow for in the legacy API=2E In this case, the first >+ * value written by userspace (on any vCPU) should not be subject >+ * to this 'correction' to make it sync up with values that only >+ * from from the kernel's default vCPU creation=2E Make the 1-second >+ * slop hack only trigger if flag is already set=2E >+ * >+ * The correct answer is for the VMM not to use the legacy API=2E >> Userspace used to be able to write zero to force a sync=2E You've remov= ed >> that ability from the ABI, and haven't even mentioned it=2E Must we? > >Will continue to use "bool user_initiated" for lack of a better move=2E Why? Can't we treat an explicit zero write just the same as when the kerne= l does it?