Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp783769lqt; Fri, 19 Apr 2024 10:13:45 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVKyQ6y4h7XKGOU8ThClb8M6X27yPVez0uJx7PJlqr9beHf0zZO/3dhfvoUABOoO5awhmtSrtw09+DhvffK6oah7ImVmOpm/7Ecok0flQ== X-Google-Smtp-Source: AGHT+IF4yNENAUZfUlzfkXoAglOkjgpZOrZzasI3ay1vl8mC2fTK5lVs1OakVDSIkoAC3BGQ2NTz X-Received: by 2002:a05:6870:44d:b0:229:faa9:3b35 with SMTP id i13-20020a056870044d00b00229faa93b35mr3404102oak.21.1713546825027; Fri, 19 Apr 2024 10:13:45 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713546824; cv=pass; d=google.com; s=arc-20160816; b=eOdSoU0JqEnU9cBU1PB8XiOkT0GCX9QpTjGp2GfTQJPaRlYRcVNgqHeYuZrMunESHG ZtjGl5s5EvJHKf5O2z7UY2s5ZR+IIT11hE6v/qA5SIx10Y2GVuuDhw1RkJx6iNGIDLZY DvyOGDwhH93x/RDFNP7arJNz7Lr/Fu043QBzI3U39PEpV5peFRDKtVEtuB/o4XiGZg5x 3MHH4P8KySbKpNzNKOswb772zzLnpIOvsiLfpqkpZjtoLh+qWuPYiJVrTiawfKN0pUry RkXMNlFPgV+Xjn0jyuNK7qRI0CEBYrFKpEDyllJ0RHIGAUSWL0bAAjTN4so8/Lh3elZD h+Hg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=GCMnj/hwIN3SdXj3uy15DUcITrrY/bpO3x9lSGYkAK8=; fh=irMFtsit1Auhpa0jQJLLhh0hWHkc5Udm447qYy7UdRg=; b=FVw/JU0tRhNVIurOVX0quEgM+qV0EgtLTtY+3nYqZgxgJ8Z+CSf4dDXXs+bwBwk2tv 2B4VFqvQ/MHn0RUrWV/BNBh7f0IYIUNQnZzgkEVCwYdm0iqF2VHm4vQXzVcrLsTqu0D/ ZWytaU5fv68FC0d6SIRKbFUcwuiuwbNfyvYE7GSfDPjzoFUo6HKJUQ1kelhkGk1W4dT1 /pPwxu8tXlbhNt39MfgxJ+6kCo9SgtVJvswifYwhfU6UPjMHy3vWzfDUC43NCq4PIHGf q+KAOKq4XaEScYsWCvxSLFFFFZTDtzBBFJ1WydkG0aJY8cDHmz3TEwichS1gpngDymrs aDQA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=I6MhlZm8; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-151812-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151812-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id o6-20020a635d46000000b005d8b7c3a019si3524346pgm.856.2024.04.19.10.13.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 10:13:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-151812-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=I6MhlZm8; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-151812-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-151812-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 642942872A0 for ; Fri, 19 Apr 2024 17:13:44 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 779AC13B28A; Fri, 19 Apr 2024 17:13:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="I6MhlZm8" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ACA0139D14; Fri, 19 Apr 2024 17:13:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713546800; cv=none; b=d4Sq9/bIjUBGtpLkhqeNEds8SVO3TONMXkB4VbdIkVSOUZ+TXpssnu3DZxB4wv0Ddr9yspfG89ED4su3losRY+Do2bWBcRMW566fF9RSTzmdWFK5r9+w6AL8Y1TGwY4L8sRyqFCaxwgHcjlQxMR0IXRB0zHHZvHFmNceKgqWojw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713546800; c=relaxed/simple; bh=mFM40ysmJgPShNXuUwicehB5gO7kf2TKXguIt+XTe9s=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=sNnsFOsvqHOCFUsGXMASlskwfk2FIQtQvEQPlLxoSuw1LeX0e5xdQ4Se1xsG4ZyDux4H/Z7/cLQsA8/0WA3u0cbbfnGdJAElniCa/RAtBbc8ETDSoSO/kof22OqQemw164SexHafcRmHGrU4UsORz2V1pvFiTbaEbCkraOTrDw0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=I6MhlZm8; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1713546799; x=1745082799; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=mFM40ysmJgPShNXuUwicehB5gO7kf2TKXguIt+XTe9s=; b=I6MhlZm8C86E+8tdFtkBNKrpe4wWiWHqTRY3YKmTKddNGAWZF173nP94 svobCiR63paxEysc6ZKfsXehnX3btVJ17nUStYtldyinEc4ZnEBqIn4IB nHWHgbZKprQxet3QtIrS2qpdSk7Pxb+qI8WcnHKWkvpNZOvxgSeE1Cv1l 141YcY5Kp4zu5ZnQicX12DUgpKI/OHF5KUMH+YM8VjE6Re+ZazkEgoLc1 6fqEm4ynaonbBoAW42GYmz+DPAd/WfKgq0J4XF8aiklXd5HERsWbqR/no GxUShmZFkgMlynN192QohnZUZaFgb5dxYw8JvDJZkbfWLJvSmpaeitpgk w==; X-CSE-ConnectionGUID: VMrlAMiOSiy3SiJSDWbnNQ== X-CSE-MsgGUID: vfHA0BheTWelZvCNs0sN6Q== X-IronPort-AV: E=McAfee;i="6600,9927,11049"; a="34559018" X-IronPort-AV: E=Sophos;i="6.07,214,1708416000"; d="scan'208";a="34559018" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2024 10:13:18 -0700 X-CSE-ConnectionGUID: z+ry7tb3R4alYO94tUd1mA== X-CSE-MsgGUID: TxSOaFYqRw25PIUEaDNeKA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,214,1708416000"; d="scan'208";a="54321210" Received: from soc-cp83kr3.jf.intel.com (HELO [10.24.10.31]) ([10.24.10.31]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2024 10:13:17 -0700 Message-ID: <3664e8ec-1fa1-48c0-a80d-546b7f6cd671@intel.com> Date: Fri, 19 Apr 2024 10:13:16 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] KVM: selftests: Add KVM/PV clock selftest to prove timer drift correction To: Jack Allister , Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Shuah Khan Cc: David Woodhouse , Paul Durrant , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20240408220705.7637-1-jalliste@amazon.com> <20240408220705.7637-3-jalliste@amazon.com> Content-Language: en-US From: "Chen, Zide" In-Reply-To: <20240408220705.7637-3-jalliste@amazon.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 4/8/2024 3:07 PM, Jack Allister wrote: > This test proves that there is an inherent KVM/PV clock drift away from the > guest TSC when KVM decides to update the PV time information structure due > to a KVM_REQ_MASTERCLOCK_UPDATE. This drift is exascerbated when a guest is > using TSC scaling and running at a different frequency to the host TSC [1]. > It also proves that KVM_[GS]ET_CLOCK_GUEST API is working to mitigate the > drift from TSC to within ±1ns. > > The test simply records the PVTI (PV time information) at time of guest > creation, after KVM has updated it's mapped PVTI structure and once the > correction has taken place. > > A singular point in time is then recorded via the guest TSC and is used to > calculate the a PV clock value using each of the 3 PVTI structures. > > As seen below a drift of ~3500ns is observed if no correction has taken > place after KVM has updated the PVTI via master clock update. However, > after the correction a delta of at most 1ns can be seen. > > * selftests: kvm: pvclock_test > * scaling tsc from 2999999KHz to 1499999KHz > * before=5038374946 uncorrected=5038371437 corrected=5038374945 > * delta_uncorrected=3509 delta_corrected=1 > > Clocksource check code has been borrowed from [2]. > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=451a707813ae > [2]: https://lore.kernel.org/kvm/20240106083346.29180-1-dongli.zhang@oracle.com/ > > Signed-off-by: Jack Allister > CC: David Woodhouse > CC: Paul Durrant > --- > tools/testing/selftests/kvm/Makefile | 1 + > .../selftests/kvm/x86_64/pvclock_test.c | 223 ++++++++++++++++++ > 2 files changed, 224 insertions(+) > create mode 100644 tools/testing/selftests/kvm/x86_64/pvclock_test.c > > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile > index 741c7dc16afc..02ee1205bbed 100644 > --- a/tools/testing/selftests/kvm/Makefile > +++ b/tools/testing/selftests/kvm/Makefile > @@ -87,6 +87,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/pmu_counters_test > TEST_GEN_PROGS_x86_64 += x86_64/pmu_event_filter_test > TEST_GEN_PROGS_x86_64 += x86_64/private_mem_conversions_test > TEST_GEN_PROGS_x86_64 += x86_64/private_mem_kvm_exits_test > +TEST_GEN_PROGS_x86_64 += x86_64/pvclock_test > TEST_GEN_PROGS_x86_64 += x86_64/set_boot_cpu_id > TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test > TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test > diff --git a/tools/testing/selftests/kvm/x86_64/pvclock_test.c b/tools/testing/selftests/kvm/x86_64/pvclock_test.c > new file mode 100644 > index 000000000000..172ef4d19c60 > --- /dev/null > +++ b/tools/testing/selftests/kvm/x86_64/pvclock_test.c > @@ -0,0 +1,223 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright © 2024, Amazon.com, Inc. or its affiliates. > + * > + * Tests for pvclock API > + * KVM_SET_CLOCK_GUEST/KVM_GET_CLOCK_GUEST > + */ > +#include > +#include > +#include > +#include > +#include > + > +#include "test_util.h" > +#include "kvm_util.h" > +#include "processor.h" > + > +enum { > + STAGE_FIRST_BOOT, > + STAGE_UNCORRECTED, > + STAGE_CORRECTED, > + NUM_STAGES > +}; > + > +#define KVMCLOCK_GPA 0xc0000000ull > +#define KVMCLOCK_SIZE sizeof(struct pvclock_vcpu_time_info) > + > +static void trigger_pvti_update(vm_paddr_t pvti_pa) > +{ > + /* > + * We need a way to trigger KVM to update the fields > + * in the PV time info. The easiest way to do this is > + * to temporarily switch to the old KVM system time > + * method and then switch back to the new one. > + */ > + wrmsr(MSR_KVM_SYSTEM_TIME, pvti_pa | KVM_MSR_ENABLED); > + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED); > +} > + > +static void guest_code(vm_paddr_t pvti_pa) > +{ > + struct pvclock_vcpu_time_info *pvti_va = > + (struct pvclock_vcpu_time_info *)pvti_pa; > + > + struct pvclock_vcpu_time_info pvti_boot; > + struct pvclock_vcpu_time_info pvti_uncorrected; > + struct pvclock_vcpu_time_info pvti_corrected; > + uint64_t cycles_boot; > + uint64_t cycles_uncorrected; > + uint64_t cycles_corrected; > + uint64_t tsc_guest; > + > + /* > + * Setup the KVMCLOCK in the guest & store the original > + * PV time structure that is used. > + */ > + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED); > + pvti_boot = *pvti_va; > + GUEST_SYNC(STAGE_FIRST_BOOT); > + > + /* > + * Trigger an update of the PVTI, if we calculate > + * the KVM clock using this structure we'll see > + * a drift from the TSC. > + */ > + trigger_pvti_update(pvti_pa); > + pvti_uncorrected = *pvti_va; > + GUEST_SYNC(STAGE_UNCORRECTED); > + > + /* > + * The test should have triggered the correction by this > + * point in time. We have a copy of each of the PVTI structs > + * at each stage now. > + * > + * Let's sample the timestamp at a SINGLE point in time and > + * then calculate what the KVM clock would be using the PVTI > + * from each stage. > + * > + * Then return each of these values to the tester. > + */ > + pvti_corrected = *pvti_va; > + tsc_guest = rdtsc(); > + > + cycles_boot = __pvclock_read_cycles(&pvti_boot, tsc_guest); > + cycles_uncorrected = __pvclock_read_cycles(&pvti_uncorrected, tsc_guest); > + cycles_corrected = __pvclock_read_cycles(&pvti_corrected, tsc_guest); > + > + GUEST_SYNC_ARGS(STAGE_CORRECTED, cycles_boot, cycles_uncorrected, > + cycles_corrected, 0); > +} > + > +static void run_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu) > +{ > + struct ucall uc; > + uint64_t ucall_reason; > + struct pvclock_vcpu_time_info pvti_before; > + uint64_t before, uncorrected, corrected; > + int64_t delta_uncorrected, delta_corrected; > + > + /* Loop through each stage of the test. */ > + while (true) { > + > + /* Start/restart the running vCPU code. */ > + vcpu_run(vcpu); > + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); > + > + /* Retrieve and verify our stage. */ > + ucall_reason = get_ucall(vcpu, &uc); > + TEST_ASSERT(ucall_reason == UCALL_SYNC, > + "Unhandled ucall reason=%lu", > + ucall_reason); > + > + /* Run host specific code relating to stage. */ > + switch (uc.args[1]) { > + case STAGE_FIRST_BOOT: > + /* Store the KVM clock values before an update. */ > + vm_ioctl(vm, KVM_GET_CLOCK_GUEST, &pvti_before); > + > + /* Sleep for a set amount of time to induce drift. */ > + sleep(5); > + break; > + > + case STAGE_UNCORRECTED: > + /* Restore the KVM clock values. */ > + vm_ioctl(vm, KVM_SET_CLOCK_GUEST, &pvti_before); > + break; > + > + case STAGE_CORRECTED: > + /* Query the clock information and verify delta. */ > + before = uc.args[2]; > + uncorrected = uc.args[3]; > + corrected = uc.args[4]; > + > + delta_uncorrected = before - uncorrected; > + delta_corrected = before - corrected; > + > + pr_info("before=%lu uncorrected=%lu corrected=%lu\n", > + before, uncorrected, corrected); > + > + pr_info("delta_uncorrected=%ld delta_corrected=%ld\n", > + delta_uncorrected, delta_corrected); > + > + TEST_ASSERT((delta_corrected <= 1) && (delta_corrected >= -1), > + "larger than expected delta detected = %ld", delta_corrected); I'm wondering what's the underling theory that we definitely can achieve ±1ns accuracy? I tested it on a Sapphire Rapids @2100MHz TSC frequency, and I can see delta_corrected=2 in ~2% cases.