Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp2492998rdb; Wed, 21 Feb 2024 09:11:36 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVAC/N+31xtO+fmc//MXSCpVJDCGjzb+cJFAkLbyQn7/NNPDnLXY3bUDVQ7wL57C1JPmembABeLLh+pJRg6x10Vt4hxBimt2a3znfoKWA== X-Google-Smtp-Source: AGHT+IH7i/IrFJWFEoDskwnbsajpF1h6TuFY4nARJxs7O7DpLiBIHmhAuBGSZ23fMo/EhP0/dl/0 X-Received: by 2002:ac8:7ec8:0:b0:42d:d3d1:6524 with SMTP id x8-20020ac87ec8000000b0042dd3d16524mr16923429qtj.8.1708535496382; Wed, 21 Feb 2024 09:11:36 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708535496; cv=pass; d=google.com; s=arc-20160816; b=sHqLQWwC8zG0VchLj/JJsJfZOUwcGxEXdXsgn51QK/sVpY6Q2qTzaPDq6LFl4wy9Ug zY4g10BlnvIwJueEsgyJ8rbkjsTD4KwAfcR9bT5Ba0/7N5+vb23yRsmNHgJxI2pmNH31 2kFT4+KxsNEqdR4KoWqMh6QJ1KFIkH9aimvnoC+zHjY3lA4JwXldAsIJba8jvxAgd5fV C95asED9Dzd/W0n/Veqj+b41lmu6pxGiwP39FmtJfHPut2e6uQqmDjBkTrUe2X495vnR VrPxojhflev1QRhzlGKJorif7vurTboc42/fV/rOvxZAz6uiOhuaNdgsdxP4z2UQIHwF 7EGg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:references:from:cc:to:message-id:date :content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:subject:dkim-signature; bh=HNyaS2ubZ/63dCqupZWCaWz7EUlNlA2EX3ub2DgQAs8=; fh=uQ51+5Wl6WZahpaA5IufZC75N+sZK0A2caFUdVTKQJI=; b=VubznJzivhRq8neZy4OTtVPgwcQwnkTJ78XSAaMDlpIF4VKHmBso0VBTFjQRyCyEKE eP0+b37B5OnDct/AD+I4MYbbW/XhvA2kHef8/Z8buEMmDhcTEPEWJ4ZbfIQen7ogTgnG Nr9kCDqROOrOz/MTCXCx40qX88/DK1GohkTflT+Jkv8K8b0oCr+Vzx1fxzS1FqOFVVb6 7+7ygdtGjDSs7gSkOapnHAbWablLZuaWRUd9OU140mF0hOohU9yn2rQVm6UqmLixRdNm mxEEy7upYuIRWTzu8FfudtHCB4nbknz48ai+st7UkNcfnPfmxbC2VHHiscQULADFNlZg pGDw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=j3vcledb; arc=pass (i=1 spf=pass spfdomain=amazon.es dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-75171-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75171-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id kc13-20020a05622a44cd00b0042d337fa1e4si10545288qtb.656.2024.02.21.09.11.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Feb 2024 09:11:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-75171-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=j3vcledb; arc=pass (i=1 spf=pass spfdomain=amazon.es dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-75171-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75171-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0D3AA1C21DD4 for ; Wed, 21 Feb 2024 17:11:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F085869313; Wed, 21 Feb 2024 17:11:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="j3vcledb" Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF02C7FBD2; Wed, 21 Feb 2024 17:11:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.95.49.90 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708535483; cv=none; b=Dcl3M567R0PEpiuQEllPHsuYT+VjhyvXsjiVGBjhzBt/HtpG4ACwOezxGbtL57gIn5n+ciGJoB45hWjY/Q8Vh/pnMNdkiV8j49PNPYlRntJ9zfDhJPUZpByvS5RUq9ERF1i2Z4gObewONG9xCCmnb6LDRPaOe7fLLQ9GLBQmBd0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708535483; c=relaxed/simple; bh=RdPgpa+YXsbUuiD5e/mvq6eImHUit013/agYC5hJ5NQ=; h=Subject:MIME-Version:Content-Type:Date:Message-ID:To:CC:From: References:In-Reply-To; b=lgx+C9NLNtJgHX/LxMU8lSsjsvz7FVgnH8WtqHkIBhKgai3CUU9btk8fr1aPwi0Oar0Y+CSmkeGSDmhDLjluOz2K/aCNrqIN0bSlUoBHtH8GnUXLrMo+etE8FujF6BxrZmxB42WYtRanGqs1pSuxAyBr7PzAye9da+HSKNe/TZs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.es; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=j3vcledb; arc=none smtp.client-ip=52.95.49.90 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.es DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1708535481; x=1740071481; h=mime-version:content-transfer-encoding:date:message-id: to:cc:from:references:in-reply-to:subject; bh=HNyaS2ubZ/63dCqupZWCaWz7EUlNlA2EX3ub2DgQAs8=; b=j3vcledbFp2X4L5NKxF5ZbNCSleHjgGNQNQ4WL4ZQbeXGyYc2Fiiz1jI C+HZ6J4IBvCL1cbW7yVHZUk8OUUX/ZsECyRwy2AgTFqXdTmW/ADKQld5E 1gCbe9cHhiN8C9gFCTQFUMfij0JDG3UWMj4+td/I383/FxgsXxT3tCmcp k=; X-IronPort-AV: E=Sophos;i="6.06,176,1705363200"; d="scan'208";a="388182292" Subject: Re: [RFC] cputime: Introduce option to force full dynticks accounting on NOHZ & NOHZ_IDLE CPUs Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2024 17:11:18 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:46546] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.16.90:2525] with esmtp (Farcaster) id 90dc1c36-4357-440a-b4ed-45fb23dc3039; Wed, 21 Feb 2024 17:11:17 +0000 (UTC) X-Farcaster-Flow-ID: 90dc1c36-4357-440a-b4ed-45fb23dc3039 Received: from EX19D004EUC001.ant.amazon.com (10.252.51.190) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 21 Feb 2024 17:11:17 +0000 Received: from localhost (10.13.235.138) by EX19D004EUC001.ant.amazon.com (10.252.51.190) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 21 Feb 2024 17:11:13 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Date: Wed, 21 Feb 2024 17:11:09 +0000 Message-ID: To: Sean Christopherson CC: , , , , , , , , , , , , From: Nicolas Saenz Julienne X-Mailer: aerc 0.16.0-127-gec0f4a50cf77 References: <20240219175735.33171-1-nsaenz@amazon.com> In-Reply-To: X-ClientProxiedBy: EX19D040UWA002.ant.amazon.com (10.13.139.113) To EX19D004EUC001.ant.amazon.com (10.252.51.190) On Wed Feb 21, 2024 at 4:24 PM UTC, Sean Christopherson wrote: > On Tue, Feb 20, 2024, Nicolas Saenz Julienne wrote: > > Hi Sean, > > > > On Tue Feb 20, 2024 at 4:18 PM UTC, Sean Christopherson wrote: > > > On Mon, Feb 19, 2024, Nicolas Saenz Julienne wrote: > > > > Under certain extreme conditions, the tick-based cputime accounting= may > > > > produce inaccurate data. For instance, guest CPU usage is sensitive= to > > > > interrupts firing right before the tick's expiration. > > Ah, this confused me. The "right before" is a bit misleading. It's more= like > "shortly before", because if the interrupt that occurs due to the guest's= tick > arrives _right_ before the host tick expires, then commit 160457140187 sh= ould > avoid horrific accounting. > > > > > This forces the guest into kernel context, and has that time slice > > > > wrongly accounted as system time. This issue is exacerbated if the > > > > interrupt source is in sync with the tick, > > It's worth calling out why this can happen, to make it clear that getting= into > such syncopation can happen quite naturally. E.g. something like: > > interrupt source is in sync with the tick, e.g. if the guest's tick > is configured to run at the same frequency as the host tick, and th= e > guest tick is every so slightly ahead of the host tick. I'll incorporate both comments into the description. :) > > > > significantly skewing usage metrics towards system time. > > > > > > ... > > > > > > > NOTE: This wasn't tested in depth, and it's mostly intended to high= light > > > > the issue we're trying to solve. Also ccing KVM folks, since it's > > > > relevant to guest CPU usage accounting. > > > > > > How bad is the synchronization issue on upstream kernels? We tried t= o address > > > that in commit 160457140187 ("KVM: x86: Defer vtime accounting 'til a= fter IRQ handling"). > > > > > > I don't expect it to be foolproof, but it'd be good to know if there'= s a blatant > > > flaw and/or easily closed hole. > > > > The issue is not really about the interrupts themselves, but their side > > effects. > > > > For instance, let's say the guest sets up an Hyper-V stimer that > > consistently fires 1 us before the preemption tick. The preemption tick > > will expire while the vCPU thread is running with !PF_VCPU (maybe insid= e > > kvm_hv_process_stimers() for ex.). As long as they both keep in sync, > > you'll get a 100% system usage. I was able to reproduce this one throug= h > > kvm-unit-tests, but the race window is too small to keep the interrupts > > in sync for long periods of time, yet still capable of producing random > > system usage bursts (which unacceptable for some use-cases). > > > > Other use-cases have bigger race windows and managed to maintain high > > system CPU usage over long periods of time. For example, with user-spac= e > > HPET emulation, or KVM+Xen (don't know the fine details on these, but > > VIRT_CPU_ACCOUNTING_GEN fixes the mis-accounting). It all comes down to > > the same situation. Something triggers an exit, and the vCPU thread goe= s > > past 'vtime_account_guest_exit()' just in time for the tick interrupt t= o > > show up. > > I suspect the common "problem" with those flows is that emulating the gue= st timer > interrupt is (a) slow, relatively speaking and (b) done with interrupts e= nabled. > > E.g. on VMX, the TSC deadline timer is emulated via VMX preemption timer,= and both > the programming of the guest's TSC deadline timer and the handling of the= expiration > interrupt is done in the VM-Exit fastpath with IRQs disabled. As a resul= t, even > if the host tick interrupt is a hair behind the guest tick, it doesn't af= fect > accounting because the host tick interrupt will never be delivered while = KVM is > emulating the guest's periodic tick. > > I'm guessing that if you tested on SVM (or a guest that doesn't use the A= PIC timer > in deadline mode), which doesn't utilize the fastpath since KVM needs to = bounce > through hrtimers, then you'd see similar accounting problems even without= using > any of the problematic "slow" timer sources. That's right, the "problem" will show up when periodically emulating something with interrupts enabled. The slower the emulation the bigger the race window. It's just a limitation of tick based accounting, I have the feeling there isn't much KVM can do. Nicolas