Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp360804rdb; Thu, 21 Dec 2023 11:10:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IEpcDmdnoyl1BZQfs8giW2FKSq5I/ywMKgmJY+jCjjMcjBL1sKBVkf0MsQBCGNjJMwvmH0F X-Received: by 2002:a2e:8902:0:b0:2cc:5780:6915 with SMTP id d2-20020a2e8902000000b002cc57806915mr32952lji.10.1703185831023; Thu, 21 Dec 2023 11:10:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703185830; cv=none; d=google.com; s=arc-20160816; b=kdieuqDZ00v8MDak6ZKEMW5Uzz6t+xOw51NsXgeQYJ80BMtwGElfFr/+8vECUcHnOF +0dUuhXG32cJlxgAbsp2mMqv1cUndaFhwAJzoit0MobCqOafgy3xoTFx7PT1py47X7Md 5G0V47d7/LdVXX8wr7k8E1Wp7/GL8XZql8N4YOPu5M0TWLZOgX4SAQUtSu3xwwrXRh7V DdWkbNHGCJBZB3dJxDmtNwlYCb7WPK8l4NBZi4oqclnXfgNveSJJPsbPeTi/rMEVvYx0 EaCpf/N5mdX/v/L4MKqFE8dHjUTaDBdvbA3MTp/VLTaYNG4xsrvqN++QrgP03WVmpmBe XZSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=AFrW4mwSQkPFJyi7GEYfnWuglO4pb57kPDM5jaJU5mM=; fh=LuFJrddUdSPdBE0yQFeztr/bGQR2oFmYaaIzU49DAk8=; b=O8KfapvXAR6uZ0wefOKTv/bep/pXuQ51rdjJ/T7gyRqgXjSi9xnEkiJpT4pbWFDAzL Y/hFP3li7vPwOZUVvWoe5o7IgnOyhNeRobC5htRYjj7rueRU2IXpNE/tzrlJSxsnHy84 kTz3S/0TVsvmZrNwEY6SvPHD3dMD3RZQVpPtDI2xOMDr4Z+aPXK+pobguR1s424rxRTy Ld5mIOzouQBVEJP/Buyh8eTNHUQGBz0ReJtfeEqfzALTc0LNRVJRqpaIHa1ukcMYKO/5 nZDmHyupkECD9R5vgcee+mrq1IrXZ55WaivEYguw6x4jVw/88GUHIGjq+NJEBpCfZbd9 OCJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=OqvsvJ4S; spf=pass (google.com: domain of linux-kernel+bounces-8977-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8977-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id h6-20020a05640250c600b00553614cb71esi1133767edb.643.2023.12.21.11.10.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 11:10:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-8977-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=OqvsvJ4S; spf=pass (google.com: domain of linux-kernel+bounces-8977-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-8977-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id C26371F23DDC for ; Thu, 21 Dec 2023 19:10:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id ADC17651BC; Thu, 21 Dec 2023 19:10:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="OqvsvJ4S" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D7D9651AB for ; Thu, 21 Dec 2023 19:10:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-553e36acfbaso1674a12.0 for ; Thu, 21 Dec 2023 11:10:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1703185812; x=1703790612; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AFrW4mwSQkPFJyi7GEYfnWuglO4pb57kPDM5jaJU5mM=; b=OqvsvJ4SH4vdSlMUMle9A0tanulvhIA/g9oMMpC44T8WzUDd3+ADTS7gNdqvyEBbt9 Ou1i3Jv5W4dGlcj1RGdiirRjh8KYAjmsEP9eGBZWEcYiNF8uqGB4rFkRnl2PlOhXIcw8 ZV4VXlH7jhC81N/WuEuMokRopcdBJYjRnKNTCXsheB+/xEEMmYU91jaYkziJvXy6Vnzw ZPYEoOSYgP3kCPU+GTttm2pGctr0uHNSCM3ouL1DAtwHTDbb/gRo1kfMo7cxhvciFCxN ji5apUSyvUzQIpa7lnXmvDSih5ACkcUzDsZhyJUYbAQE0rGHf4HZ97d7WG+NTtSYsZ8d OEjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703185812; x=1703790612; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AFrW4mwSQkPFJyi7GEYfnWuglO4pb57kPDM5jaJU5mM=; b=B3BVjvAGRtqqxPsBqoL4bnRnQulPry4D5KW9Kpj+QYiLJlEQzfDWIbTljjDuJYaqwq AYZ4rQkyF5CRn3U2M9YHhZ13z8NaXzwIBLSIqEFWBjiqzsDtbh5Edz1E6wq7grFO2TF9 XA4oZ9GG/DrK8cP7EY6bjDTAgAuLHI2HD3Kkzl9EN88+CNEAzkF+I7zj1yylPo+aOZR/ SQLrY4XoLgIxjrF1IGD84H/gwuvoMtrGW53CPPhyvnNppZPcR97JBBfQQJQFJimrTq0m xM9yvQCWP5KXhxfopkEAOLc2kfQoUM+8Yo0tNdBFvgWPloStevkox5tWckBpjYuCU/wB DaCg== X-Gm-Message-State: AOJu0YypNS9Vg34tikM1tUQ9rwx5ubrHAlKZDvQrjXomAd4HojZ7lPm9 TCAKqn5nIj5XvO94OlqkWiwqgthcaP/EEc3y7futGrtB9MDT X-Received: by 2002:a50:d705:0:b0:553:62b4:5063 with SMTP id t5-20020a50d705000000b0055362b45063mr8689edi.4.1703185812520; Thu, 21 Dec 2023 11:10:12 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20c9c21619aa44363c2c7503db1581cb816a1c0f.camel@redhat.com> In-Reply-To: <20c9c21619aa44363c2c7503db1581cb816a1c0f.camel@redhat.com> From: Jim Mattson Date: Thu, 21 Dec 2023 11:09:57 -0800 Message-ID: Subject: Re: RFC: NTP adjustments interfere with KVM emulation of TSC deadline timers To: Maxim Levitsky Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini , Sean Christopherson , Marc Zyngier , Thomas Gleixner , Vitaly Kuznetsov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 21, 2023 at 8:52=E2=80=AFAM Maxim Levitsky wrote: > > > Hi! > > Recently I was tasked with triage of the failures of 'vmx_preemption_time= r' > that happen in our kernel CI pipeline. > > > The test usually fails because L2 observes TSC after the > preemption timer deadline, before the VM exit happens. > > This happens because KVM emulates nested preemption timer with HR timers, > so it converts the preemption timer value to nanoseconds, taking in accou= nt > tsc scaling and host tsc frequency, and sets HR timer. > > HR timer however as I found out the hard way is bound to CLOCK_MONOTONIC, > and thus its rate can be adjusted by NTP, which means that it can run slo= wer or > faster than KVM expects, which can result in the interrupt arriving earli= er, > or late, which is what is happening. > > This is how you can reproduce it on an Intel machine: > > > 1. stop the NTP daemon: > sudo systemctl stop chronyd.service > 2. introduce a small error in the system time: > sudo date -s "$(date)" > > 3. start NTP daemon: > sudo chronyd -d -n (for debug) or start the systemd service again > > 4. run the vmx_preemption_timer test a few times until it fails: > > > I did some research and it looks like I am not the first to encounter thi= s: > > From the ARM side there was an attempt to support CLOCK_MONOTONIC_RAW wit= h > timer subsystem which was even merged but then reverted due to issues: > > https://lore.kernel.org/all/1452879670-16133-3-git-send-email-marc.zyngie= r@arm.com/T/#u > > It looks like this issue was later worked around in the ARM code: > > > commit 1c5631c73fc2261a5df64a72c155cb53dcdc0c45 > Author: Marc Zyngier > Date: Wed Apr 6 09:37:22 2016 +0100 > > KVM: arm/arm64: Handle forward time correction gracefully > > On a host that runs NTP, corrections can have a direct impact on > the background timer that we program on the behalf of a vcpu. > > In particular, NTP performing a forward correction will result in > a timer expiring sooner than expected from a guest point of view. > Not a big deal, we kick the vcpu anyway. > > But on wake-up, the vcpu thread is going to perform a check to > find out whether or not it should block. And at that point, the > timer check is going to say "timer has not expired yet, go back > to sleep". This results in the timer event being lost forever. > > There are multiple ways to handle this. One would be record that > the timer has expired and let kvm_cpu_has_pending_timer return > true in that case, but that would be fairly invasive. Another is > to check for the "short sleep" condition in the hrtimer callback, > and restart the timer for the remaining time when the condition > is detected. > > This patch implements the latter, with a bit of refactoring in > order to avoid too much code duplication. > > Cc: > Reported-by: Alexander Graf > Reviewed-by: Alexander Graf > Signed-off-by: Marc Zyngier > Signed-off-by: Christoffer Dall > > > So to solve this issue there are two options: > > > 1. Have another go at implementing support for CLOCK_MONOTONIC_RAW timers= . > I don't know if that is feasible and I would be very happy to hear a f= eedback from you. > > 2. Also work this around in KVM. KVM does listen to changes in the timeke= eping system > (kernel calls its update_pvclock_gtod), and it even notes rates of both= regular and raw clocks. > > When starting a HR timer I can adjust its period for the difference in = rates, which will in most > cases produce more correct result that what we have now, but will still= fail if the rate > is changed at the same time the timer is started or before it expires. > > Or I can also restart the timer, although that might cause more harm th= an > good to the accuracy. > > > What do you think? Is this what the "adaptive tuning" in the local APIC TSC_DEADLINE timer is all about (lapic_timer_advance_ns =3D -1)? If so, can we leverage that for the VMX-preemption timer as well? > > Best regards, > Maxim Levitsky > > >