Received: by 2002:ab2:6991:0:b0:1f7:f6c3:9cb1 with SMTP id v17csp897850lqo; Wed, 8 May 2024 20:32:52 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWxydDFDBQiZtVwFSeCME9TuOJ9gyBLebkB6x/B4uO2b+8KmoXtVnsdsp2fr7AaGlexCgQOxatE4nHK9+M7+Cniiq+5ezF1T7ha1FvBsA== X-Google-Smtp-Source: AGHT+IEF27tXGfP5iZRLwK5IJXJYpZhVe1Tzk357E6AvGUclpsShU6miVwTcaK2YLUS90FQ+CSjt X-Received: by 2002:a54:4496:0:b0:3c9:68d3:eb39 with SMTP id 5614622812f47-3c9851e0b57mr4264020b6e.0.1715225572351; Wed, 08 May 2024 20:32:52 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715225572; cv=pass; d=google.com; s=arc-20160816; b=CVqhI+oT9cpkS8xb7588mTPvZ5XDEs4lni8mY+PzylK2D/UaVh1cyWUDr9iwBYUGJG rzdRVINCtqTC9LzRny4/wz2vzRoozcjByb7F1ILQEcepvjFLJgwDD5ntMTYNNsYYlxg4 w25Ei6mRS+BYD5/jcI3Z/u+Fur+Emnwk+0e38f0VB2pZi4LKWIjbXqvdcDxVw0Zlj446 3M45expgqF920XhOzO08lGMTOh1OznjO1w7dajRQeERC0wMM/s/grKmL7f8UDtHJUkux 2Z9F/hT4eMtyG73MgoDiLRl8W+YejaV7EgSREAYvmyrPIHcguC+TqVaFdppR3iBBZSXG 2Btw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=D53cteUlVNor0778RZxbBUSCLuHE3iiNXBP/lHlSGQs=; fh=vEhT38wXXquBzvMmcw9abChgcNJDO+X2jHbbX/IIDRk=; b=az3Esb4jUD9gY/i/mHbh+1W4eLXK8pRjOrbZACJSi1PI5VklZ24JTYIXaIi4hDM3b+ wx8/4RCK3TGLqdswxAhs1B8LW+XQRraLI15h2ePV8Mjl4GRlZTnDXeE8DnhgJC6L3qiW IAIBp2QjssE1GNuG+MjW5fEGW+GuQVX633tZiU9wNSWDR01gF39zvGuTqwDIbgY7Cq9g exGRKApHZ+sp4iOmbVPTxmaNcot1c8E2nOkOCxivJXwlL5K4HuWJLz1g/Fpg3byyTX/L 8OO+MCSwKYMU30b1BbMVE3QWF9+DZcamYAfBJmkU5J/FsI8Epv9VPIik41kUuGf4zfal rRZw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Yu5mE6mz; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-174091-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-174091-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id 41be03b00d2f7-63409e83881si518100a12.88.2024.05.08.20.32.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 May 2024 20:32:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-174091-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Yu5mE6mz; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-174091-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-174091-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E9AF928262F for ; Thu, 9 May 2024 03:32:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2F6C41482E8; Thu, 9 May 2024 03:32:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Yu5mE6mz" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E51D13C3D3; Thu, 9 May 2024 03:32:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715225561; cv=none; b=Si46kXoEE5sBih7GdEEZGJugh4GtRSvQ6ipg8O3D0PGZPCIUh4jFOsOvgQbv9AoTYi1Bjx7PMgyEvudrMqVRMaifTBxM5977/MmCWMB5ONvkeiAuORiowUU6znVhz0G7VQFQxARXCkx4eBYCj9COS8bZk1/zSQ4XKl/bgM6b5b4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715225561; c=relaxed/simple; bh=cU5rhukE0f4MXjKW6iNyEqKDgbf6wx3cyccwEox0vek=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Bg/TgtDlS8dWqOl+3QxfCKp7+37gJPSRdOxhAlqycgh/BBnwqmL6bCleA4v61fCztu2RNi9nsXIDzr9d8tPuJ0nduWaFJcXdQ0BJY3aqXrX79W8tzVpWTxGog9y2rTXARg/SLQpY9ZGRnJzLfF9YPd8s1lqXivgRZKxDy5fmcnU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Yu5mE6mz; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA494C2BBFC; Thu, 9 May 2024 03:32:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715225560; bh=cU5rhukE0f4MXjKW6iNyEqKDgbf6wx3cyccwEox0vek=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=Yu5mE6mzvb4Q2w+6ai4dNJnbyMxoUDKXkCNq3h6CtQk57d+Me8ZV5WM1yyiUl+4OP ARGeQWswZXUcfMMwoI6ZSFjcv3r/GH9e9yUtUrtLFnWs2Ybg2lvl2Ly2yvPJfZdoqA xfRWYB3wQkxxuMhvgPyOXp8N3OTcFEt2UuWezJePNt7GsiScp/FDSx5xeu7OEuYY8E LaBIYWWrmwRuJfOIfgfEkgLdLEGjuiWcixs4Ithi83hWX/ntGSP8yG8uZzdqUlghm2 rKiRNbLQHJnLFjmsHobazI/X15mH+kOu79os5nZ4AnIGFJsgSDlIOVpEtLUPq59lkJ MCANTqJ7CZlYw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 4B2FCCE29D2; Wed, 8 May 2024 20:32:40 -0700 (PDT) Date: Wed, 8 May 2024 20:32:40 -0700 From: "Paul E. McKenney" To: Sean Christopherson Cc: Leonardo Bras , Paolo Bonzini , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Marcelo Tosatti , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org Subject: Re: [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu Message-ID: <5fd66909-1250-4a91-aa71-93cb36ed4ad5@paulmck-laptop> Reply-To: paulmck@kernel.org References: <663a659d-3a6f-4bec-a84b-4dd5fd16c3c1@paulmck-laptop> <0e239143-65ed-445a-9782-e905527ea572@paulmck-laptop> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, May 08, 2024 at 07:01:29AM -0700, Sean Christopherson wrote: > On Wed, May 08, 2024, Leonardo Bras wrote: > > Something just hit me, and maybe I need to propose something more generic. > > Yes. This is what I was trying to get across with my complaints about keying off > of the last VM-Exit time. It's effectively a broad stroke "this task will likely > be quiescent soon" and so the core concept/functionality belongs in common code, > not KVM. OK, we could do something like the following wholly within RCU, namely to make rcu_pending() refrain from invoking rcu_core() until the grace period is at least the specified age, defaulting to zero (and to the current behavior). Perhaps something like the patch shown below. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit abc7cd2facdebf85aa075c567321589862f88542 Author: Paul E. McKenney Date: Wed May 8 20:11:58 2024 -0700 rcu: Add rcutree.nocb_patience_delay to reduce nohz_full OS jitter If a CPU is running either a userspace application or a guest OS in nohz_full mode, it is possible for a system call to occur just as an RCU grace period is starting. If that CPU also has the scheduling-clock tick enabled for any reason (such as a second runnable task), and if the system was booted with rcutree.use_softirq=0, then RCU can add insult to injury by awakening that CPU's rcuc kthread, resulting in yet another task and yet more OS jitter due to switching to that task, running it, and switching back. In addition, in the common case where that system call is not of excessively long duration, awakening the rcuc task is pointless. This pointlessness is due to the fact that the CPU will enter an extended quiescent state upon returning to the userspace application or guest OS. In this case, the rcuc kthread cannot do anything that the main RCU grace-period kthread cannot do on its behalf, at least if it is given a few additional milliseconds (for example, given the time duration specified by rcutree.jiffies_till_first_fqs, give or take scheduling delays). This commit therefore adds a rcutree.nocb_patience_delay kernel boot parameter that specifies the grace period age (in milliseconds) before which RCU will refrain from awakening the rcuc kthread. Preliminary experiementation suggests a value of 1000, that is, one second. Increasing rcutree.nocb_patience_delay will increase grace-period latency and in turn increase memory footprint, so systems with constrained memory might choose a smaller value. Systems with less-aggressive OS-jitter requirements might choose the default value of zero, which keeps the traditional immediate-wakeup behavior, thus avoiding increases in grace-period latency. Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/ Reported-by: Leonardo Bras Suggested-by: Leonardo Bras Suggested-by: Sean Christopherson Signed-off-by: Paul E. McKenney diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0a3b0fd1910e6..42383986e692b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4981,6 +4981,13 @@ the ->nocb_bypass queue. The definition of "too many" is supplied by this kernel boot parameter. + rcutree.nocb_patience_delay= [KNL] + On callback-offloaded (rcu_nocbs) CPUs, avoid + disturbing RCU unless the grace period has + reached the specified age in milliseconds. + Defaults to zero. Large values will be capped + at five seconds. + rcutree.qhimark= [KNL] Set threshold of queued RCU callbacks beyond which batch limiting is disabled. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 7560e204198bb..6e4b8b43855a0 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -176,6 +176,8 @@ static int gp_init_delay; module_param(gp_init_delay, int, 0444); static int gp_cleanup_delay; module_param(gp_cleanup_delay, int, 0444); +static int nocb_patience_delay; +module_param(nocb_patience_delay, int, 0444); // Add delay to rcu_read_unlock() for strict grace periods. static int rcu_unlock_delay; @@ -4334,6 +4336,8 @@ EXPORT_SYMBOL_GPL(cond_synchronize_rcu_full); static int rcu_pending(int user) { bool gp_in_progress; + unsigned long j = jiffies; + unsigned int patience = msecs_to_jiffies(nocb_patience_delay); struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_node *rnp = rdp->mynode; @@ -4347,11 +4351,13 @@ static int rcu_pending(int user) return 1; /* Is this a nohz_full CPU in userspace or idle? (Ignore RCU if so.) */ - if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu()) + gp_in_progress = rcu_gp_in_progress(); + if ((user || rcu_is_cpu_rrupt_from_idle() || + (gp_in_progress && time_before(j + patience, rcu_state.gp_start))) && + rcu_nohz_full_cpu()) return 0; /* Is the RCU core waiting for a quiescent state from this CPU? */ - gp_in_progress = rcu_gp_in_progress(); if (rdp->core_needs_qs && !rdp->cpu_no_qs.b.norm && gp_in_progress) return 1; diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 340bbefe5f652..174333d0e9507 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -93,6 +93,15 @@ static void __init rcu_bootup_announce_oddness(void) pr_info("\tRCU debug GP init slowdown %d jiffies.\n", gp_init_delay); if (gp_cleanup_delay) pr_info("\tRCU debug GP cleanup slowdown %d jiffies.\n", gp_cleanup_delay); + if (nocb_patience_delay < 0) { + pr_info("\tRCU NOCB CPU patience negative (%d), resetting to zero.\n", nocb_patience_delay); + nocb_patience_delay = 0; + } else if (nocb_patience_delay > 5 * MSEC_PER_SEC) { + pr_info("\tRCU NOCB CPU patience too large (%d), resetting to %ld.\n", nocb_patience_delay, 5 * MSEC_PER_SEC); + nocb_patience_delay = 5 * MSEC_PER_SEC; + } else if (nocb_patience_delay) { + pr_info("\tRCU NOCB CPU patience set to %d milliseconds.\n", nocb_patience_delay); + } if (!use_softirq) pr_info("\tRCU_SOFTIRQ processing moved to rcuc kthreads.\n"); if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG))