Received: by 2002:ab2:6991:0:b0:1f7:f6c3:9cb1 with SMTP id v17csp127763lqo; Tue, 7 May 2024 14:38:09 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVST2O7so5bv2IhgXcMs+a1Z9Nr40//66M+seyuQMysGfV24ZGmTNKGh7qMMizR9j3TO6bTmVorcRS3rbQU0UceQ/DnqqFHpJ0A1URHaw== X-Google-Smtp-Source: AGHT+IG9gBEILFFs4QwelGIMR0ludr2cpAzPU+0CMtVNoA/nGnUO6Isl3muFnO43g5vgY7pAaLnJ X-Received: by 2002:a05:622a:85:b0:434:fd7d:6dc7 with SMTP id d75a77b69052e-43dbf84f480mr11277501cf.50.1715117889412; Tue, 07 May 2024 14:38:09 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715117889; cv=pass; d=google.com; s=arc-20160816; b=KfculXEmEMVeeQEPXyr5ESaFAwBo5vSDgRVTDF9/bpPtTP7tT4kHGn+F1jpLElNQVw BkKBs42XLwuRk0SnNSgTg82f/xUPIQwdoKkocKTowCP5FCSPE506gJdhGjIhpqYXZ3+V F0D/5qSQ7VJUfJ4oIM4BB6oDAzzcA6IscBEO2WeA5ksxdSDnGp8pcqghhKfZZDXh+3v5 MK4O8qDVRMWfwtCeMZ3JmqeGTFfNcQVp63PtYLYXe5AxPbW20S5fGqUKm5W+DiXG0myd nKOEZMAtE/UPCa/YAkDC9/bThxiPKgyDW4pbXgOYVv7QNXGVijs7tbcY9OYjwhrwcVZk /tqw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=lKUsflTEDh3X7OgZVH+ZsGN6Kexw3a1z3lTihTAPBP0=; fh=vEhT38wXXquBzvMmcw9abChgcNJDO+X2jHbbX/IIDRk=; b=U1It8KUGY7Q0qhHHW2azauTG6jYwOxqgRfAfPZfuJkryTNSqq5Bl5MRt/cFVOTY+Ay c9FyWyv9XQKvvIINCj/Sz5ocwlxMo1FAXpXvdlZ/wdcS4PDPb1IzKKKFWAfL0ggqYHr1 QLvhS6uNcblTQWhZWrqSVrseA77pdc6RCcxPOuRrxsJZOzJ4sO5x3k97eSYsZHZ9vS7n LDr8HOLVDWBcXEN8BF0TH6AFhDbHf+I29yAgur645FCA0deLMMPJ1pRCHzi57kuNyvnf qIOROf9uaM/QzPdbrkPQcRTNmvHxJdNo5xfcPZ7XNB/SYPl//PCzvmMk/Op0jAHMNB8/ HM6g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=W+vpBcZ1; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-172219-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-172219-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id j19-20020ac85f93000000b00439d100fb3dsi9281338qta.668.2024.05.07.14.38.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 May 2024 14:38:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-172219-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=W+vpBcZ1; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-172219-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-172219-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0AC0C1C236C8 for ; Tue, 7 May 2024 21:38:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D2A9778269; Tue, 7 May 2024 21:37:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="W+vpBcZ1" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE022187326; Tue, 7 May 2024 21:37:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715117879; cv=none; b=grlZm6QEaSSkcj8RXKZZ13gG44LS1islPYV7kp4LfCsyBPnxgr0KvWVpg9sfKpWEF46+OXUNSosgLc6yN8DRLLag1eBSKCJSXDgQhPC7Ab7uPjAu+2O6QqAuJH+aCmNoLDTGksJCoFyxeg2CWSqbUSI4MWuGzyf2I6eBLRt3hG8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715117879; c=relaxed/simple; bh=b0erpGJkbKehHG2oRo2+D/HPvaIA53JmaUColexBDK0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=O8+0pPooxruE72mv00ydKVaPD+ljqYITFVlwHR0Nmq6IsLL9C3fdfy+w2e6FRNDoj8qsa9MTnykeQbsneDe6UPHV2SLtXro2FEB1YjDwUfTj63hxC/Ekk9+lPU1tE65VgjFWFteO52FT3rnKJRX9trlHrsEZ8UawC4hRZXC0m20= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=W+vpBcZ1; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C325C2BBFC; Tue, 7 May 2024 21:37:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715117878; bh=b0erpGJkbKehHG2oRo2+D/HPvaIA53JmaUColexBDK0=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=W+vpBcZ1T0cZAzGv4Vm3pWP6Skmffhzq9UvcBveRu2iX51Pcp4aBUZWS6arxPQTTJ k8pYZasyBwioDP/z7EVPkIn7+ZtGun1LAN5Ux9S08WhuH6HiMD/qjA6npj1W+t+pnO zJR1uvpOWxMtEhsyYXpm2ob/oQ23glrIcfxmU0SonH5a/sg4Pu1ZUOBTZMDShDmQvX +YNVOnALWHHjlkI50GCjyfsoPNWmnpv5kAdagepBJC7S/ztTXhON53jbwYDP5ghoYw O1ofq1BOP/KDdomjtPe1nlSr8k+UTOZSG39DmcjkdTHOBS1zUYTqQWnylZreXRYXoS QrOWdIKLpvi+w== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id DEB30CE0FF3; Tue, 7 May 2024 14:37:57 -0700 (PDT) Date: Tue, 7 May 2024 14:37:57 -0700 From: "Paul E. McKenney" To: Sean Christopherson Cc: Leonardo Bras , Paolo Bonzini , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Marcelo Tosatti , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org Subject: Re: [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu Message-ID: <0e239143-65ed-445a-9782-e905527ea572@paulmck-laptop> Reply-To: paulmck@kernel.org References: <20240328171949.743211-1-leobras@redhat.com> <3b2c222b-9ef7-43e2-8ab3-653a5ee824d4@paulmck-laptop> <663a659d-3a6f-4bec-a84b-4dd5fd16c3c1@paulmck-laptop> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, May 07, 2024 at 02:00:12PM -0700, Sean Christopherson wrote: > On Tue, May 07, 2024, Paul E. McKenney wrote: > > On Tue, May 07, 2024 at 10:55:54AM -0700, Sean Christopherson wrote: > > > On Fri, May 03, 2024, Paul E. McKenney wrote: > > > > On Fri, May 03, 2024 at 02:29:57PM -0700, Sean Christopherson wrote: > > > > > So if we're comfortable relying on the 1 second timeout to guard against a > > > > > misbehaving userspace, IMO we might as well fully rely on that guardrail. I.e. > > > > > add a generic PF_xxx flag (or whatever flag location is most appropriate) to let > > > > > userspace communicate to the kernel that it's a real-time task that spends the > > > > > overwhelming majority of its time in userspace or guest context, i.e. should be > > > > > given extra leniency with respect to rcuc if the task happens to be interrupted > > > > > while it's in kernel context. > > > > > > > > But if the task is executing in host kernel context for quite some time, > > > > then the host kernel's RCU really does need to take evasive action. > > > > > > Agreed, but what I'm saying is that RCU already has the mechanism to do so in the > > > form of the 1 second timeout. > > > > Plus RCU will force-enable that CPU's scheduler-clock tick after about > > ten milliseconds of that CPU not being in a quiescent state, with > > the time varying depending on the value of HZ and the number of CPUs. > > After about ten seconds (halfway to the RCU CPU stall warning), it will > > resched_cpu() that CPU every few milliseconds. > > > > > And while KVM does not guarantee that it will immediately resume the guest after > > > servicing the IRQ, neither does the existing userspace logic. E.g. I don't see > > > anything that would prevent the kernel from preempting the interrupt task. > > > > Similarly, the hypervisor could preempt a guest OS's RCU read-side > > critical section or its preempt_disable() code. > > > > Or am I missing your point? > > I think you're missing my point? I'm talking specifically about host RCU, what > is or isn't happening in the guest is completely out of scope. Ah, I was thinking of nested virtualization. > My overarching point is that the existing @user check in rcu_pending() is optimistic, > in the sense that the CPU is _likely_ to quickly enter a quiescent state if @user > is true, but it's not 100% guaranteed. And because it's not guaranteed, RCU has > the aforementioned guardrails. You lost me on this one. The "user" argument to rcu_pending() comes from the context saved at the time of the scheduling-clock interrupt. In other words, the CPU really was executing in user mode (which is an RCU quiescent state) when the interrupt arrived. And that suffices, 100% guaranteed. The reason that it suffices is that other RCU code such as rcu_qs() and rcu_note_context_switch() ensure that this CPU does not pay attention to the user-argument-induced quiescent state unless this CPU had previously acknowledged the current grace period. And if the CPU has previously acknowledged the current grace period, that acknowledgement must have preceded the interrupt from user-mode execution. Thus the prior quiescent state represented by that user-mode execution applies to that previously acknowledged grace period. This is admittedly a bit indirect, but then again this is Linux-kernel RCU that we are talking about. > And I'm arguing that, since the @user check isn't bombproof, there's no reason to > try to harden against every possible edge case in an equivalent @guest check, > because it's unnecessary for kernel safety, thanks to the guardrails. And the same argument above would also apply to an equivalent check for execution in guest mode at the time of the interrupt. Please understand that I am not saying that we absolutely need an additional check (you tell me!). But if we do need RCU to be more aggressive about treating guest execution as an RCU quiescent state within the host, that additional check would be an excellent way of making that happen. Thanx, Paul