Received: by 2002:ab2:7855:0:b0:1f9:5764:f03e with SMTP id m21csp83179lqp; Tue, 21 May 2024 19:45:39 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXO8wjXtFu5YQWySnPxjcwgSV1f8Hl/qQvsD219WL2oRXa/2I27IR8aBlqCcbHxrYVBgcsPAmRUjGvspm32lJpp2bwTamXRVzoaMzR2Qg== X-Google-Smtp-Source: AGHT+IHnCaHmkD6NHlQ/2ndGzxVwilpJpHSpzTtG1Y23UQSMgEdvfgb48IT+ZSmRNBaif7IijfaT X-Received: by 2002:a05:6870:1654:b0:23f:3c45:e7f1 with SMTP id 586e51a60fabf-24c68bad944mr928855fac.7.1716345939654; Tue, 21 May 2024 19:45:39 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716345939; cv=pass; d=google.com; s=arc-20160816; b=SRNlXSKV0KCpCaEDprd7/+qrRo0PTU09Kz9pLqb2PQzCcmmCQ4Eyi8uD8aXCix9QIi Qd/BhfFbxq2HxWyy1ahiH1JZ9gYf2JZFdvnhiip9BINWhpw2NBRyQOQ0YWpRuD3NsTAx NFRh9kN8kmvQJW1h+IgMnO/l48dRntAWCjDKAcf0JtMy3D/j5Ql+Af/ooic39DM3LrjI P9ektcydxnliSrXaCCF1zBbLKlGkfnxQciL9PUsc/ROHaRQu8h/JuhWjtGTsavd7ZTip BXe3XadiWF1Vma3PNLpGUOkuzGs+uGmv9wbbROkL49Rn0gvsQAxv5xhagSMev8sS164D 8uJw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=4/r8tox2hKRCSET1svgANSR8IiriUmTqewsrNclFBAQ=; fh=cHbBeIcWPXug0itFmRfudfa44WmtKnJkoGAOtdzsqoA=; b=oT7S2/o7IAe28nLd4UGA0ya2IJZsIusdZXkVgVUq75A8EkB/vyN3LMCv5Vg54DltGS D6uaV41rNzm56KNJxoInnP/mopVVgjHGUmRvX6t028rH+zQKDfPpA+49M5uW5Qdw+sZL /7YWBxUlC7ip59cZqQzvtZOa6hfErQdNWi+IUX698GMQSqQraN7aO/AsqW1JyX7zm/1F ZMbzmJgHAMBHo2p2NBsS+3Xq6tP2rRfs5DD46KD2BMjvbRUSnT5x5QclzQ5waskJU5e3 LQ9hm6mQ8kn7SiKvOo5mCU/OTqhdbG3SvMcyLdOAdwl7uA9beKL7iqclCVFzNwnyvnDL TfOg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=trZBdNVk; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-185557-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-185557-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id d2e1a72fcca58-6f4f8b0d0ebsi5025173b3a.206.2024.05.21.19.45.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 May 2024 19:45:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-185557-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=trZBdNVk; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-185557-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-185557-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 03E002856FF for ; Wed, 22 May 2024 00:24:07 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2FC8A139CF7; Wed, 22 May 2024 00:18:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="trZBdNVk" Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54592290F; Wed, 22 May 2024 00:18:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716337118; cv=none; b=F/HMkc7qQYPtlGgY+EoE2OgO/+kNMtfWCgMwPMTPc6FDeqyQs0fFR1HV1d3zN6F9a5CVqtjzFpW9NkwZI0PWsQksg7jIWzD+9l7/eK4FOmtivIftX0DSSfrs4Sh4xuLnC/uaWMLNF1Fh7vwGd7baAWst9TZwvOGMvbN6zhwXC0M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716337118; c=relaxed/simple; bh=i70bK26YTpbLHrHjp4PIPhrdFTO9m5AUpbQPwhNvTAQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=m9KRC4AEajc8NXwf2yJGwhU8ntY2meMeiWevPQ7mozECGxMOb5NjI2WUqkaOcoK8mBI1f2ttqB/UHvCepMnfprzxTFSLrAasIXdsKyVlduInY2ZhzKwBFonhzkCd4l+LGqUlUR3weGXzf6KIZYcuPTU6iIj+zcPzoB1vQgV6g7o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=trZBdNVk; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To: Content-ID:Content-Description:In-Reply-To:References; bh=4/r8tox2hKRCSET1svgANSR8IiriUmTqewsrNclFBAQ=; b=trZBdNVkJzuEAc+k73lQqe7nTn Utk9V0wAKd1XnkXDF1lnwmiax9LFeCOQkARPY0JVTvjM6e1HqlfmfLhti1gFCrdhSKv9I9PxjxhX0 an1j9yWtfTkRjo5x8X0ADc41y2pdjyLHgUf+zEiljIz/ljIXq6z0o91Ay8EZjcMVBLfbFw3VAPWwo 2+O0FodNsD2m2hnqz3aMrLMy33DsVOlPk1ikdiKCbp//Gpkd71EiAG9XR5e7kAvzGaVkgUeQ02SC1 VbBJf5sXsIG8ZU79MF83cEbq2SPkWKILJNHNzpCIsIp/CgXF7iExwC+4ZQRYC+et/p8LgIZYNe2Vf bV4xxMmA==; Received: from [2001:8b0:10b:1::ebe] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1s9ZgR-00000000815-3bjR; Wed, 22 May 2024 00:18:19 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1s9ZgR-00000002b4B-1ZE4; Wed, 22 May 2024 01:18:19 +0100 From: David Woodhouse To: kvm@vger.kernel.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, jalliste@amazon.co.uk, sveith@amazon.de, zide.chen@intel.com, Dongli Zhang , Chenyi Qiang Subject: [RFC PATCH v3 00/21] Cleaning up the KVM clock mess Date: Wed, 22 May 2024 01:16:55 +0100 Message-ID: <20240522001817.619072-1-dwmw2@infradead.org> X-Mailer: git-send-email 2.44.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Clean up the KVM clock mess somewhat so that it is either based on the guest TSC ("master clock" mode), or on the host CLOCK_MONOTONIC_RAW in cases where the TSC isn't usable. Eliminate the third variant where it was based directly on the *host* TSC, due to bugs in e.g. __get_kvmclock(). Kill off the last vestiges of the KVM clock being based on CLOCK_MONOTONIC instead of CLOCK_MONOTONIC_RAW and thus being subject to NTP skew. Fix up migration support to allow the KVM clock to be saved/restored as an arithmetic function of the guest TSC, since that's what it actually is in the *common* case so it can be migrated precisely. Or at least to within ±1 ns which is good enough, as discussed in https://lore.kernel.org/kvm/c8dca08bf848e663f192de6705bf04aa3966e856.camel@infradead.org In v2 of this series, TSC synchronization is improved and simplified a bit too, and we allow masterclock mode to be used even when the guest TSCs are out of sync, as long as they're running at the same *rate*. The different *offset* shouldn't matter. And the kvm_get_time_scale() function annoyed me by being entirely opaque, so I studied it until my brain hurt and then added some comments. In v2 I also dropped the commits which were removing the periodic clock syncs. In v3 I put them back again but *only* for the non-masterclock mode, along with cleaning up some other gratuitous clock jumps while in masterclock mode. And Jack's patch to move the pvclock structure to uapi. I also fixed the bug pointed out by Chenyi Qiang, that I was failing to set vcpu->arch.this_tsc_{nsec,write} after removing the cur_tsc_* fields. I also included patches to fix advertised steal time going backwards, and to make the guest more resilient to it. Those may end up being split out and submitted under separate cover (with selftests). Still needs more comprehensive selftests. David Woodhouse (18): KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host KVM: x86: Fix KVM clock precision in __get_kvmclock() KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() KVM: x86: Simplify and comment kvm_get_time_scale() KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() KVM: x86: Improve synchronization in kvm_synchronize_tsc() KVM: x86: Kill cur_tsc_{nsec,offset,write} fields KVM: x86: Allow KVM master clock mode when TSCs are offset from each other KVM: x86: Factor out kvm_use_master_clock() KVM: x86: Avoid global clock update on setting KVM clock MSR KVM: x86: Avoid gratuitous global clock reload in kvm_arch_vcpu_load() KVM: x86: Avoid periodic KVM clock updates in master clock mode KVM: x86/xen: Prevent runstate times from becoming negative sched/cputime: Cope with steal time going backwards or negative Jack Allister (3): KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration UAPI: x86: Move pvclock-abi to UAPI for x86 platforms KVM: selftests: Add KVM/PV clock selftest to prove timer correction Documentation/virt/kvm/api.rst | 37 ++ Documentation/virt/kvm/devices/vcpu.rst | 115 +++- arch/x86/include/asm/kvm_host.h | 15 +- arch/x86/include/uapi/asm/kvm.h | 6 + arch/x86/include/{ => uapi}/asm/pvclock-abi.h | 24 +- arch/x86/kvm/svm/svm.c | 3 +- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 716 +++++++++++++++------- arch/x86/kvm/xen.c | 22 +- include/uapi/linux/kvm.h | 3 + kernel/sched/cputime.c | 20 +- tools/testing/selftests/kvm/Makefile | 1 + tools/testing/selftests/kvm/x86_64/pvclock_test.c | 192 ++++++ 13 files changed, 884 insertions(+), 272 deletions(-)