Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp1488931rdh; Fri, 27 Oct 2023 16:42:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEDcnO4spXjoCga4ZLAf6hN3uYpnMhCKV9/W2gLmqFcy3Yh1Wjzf+ssJQVslV9C6heHbb/K X-Received: by 2002:a17:903:84b:b0:1cc:138a:287b with SMTP id ks11-20020a170903084b00b001cc138a287bmr3891488plb.3.1698450134377; Fri, 27 Oct 2023 16:42:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698450134; cv=none; d=google.com; s=arc-20160816; b=IISKQAHTrgVve6sZzMbnnFEZtskFcYdRpFiC9weSe9UCUxQYpnQ5eA2Dz+ZScKdPn/ cUPVu6gGCxWKZv/Y7vCkqaHI8pR1HfojNY2kNDAfxu29Fc5vwN2NaMQLpzOrVe1CH+QH qGuL/zSkg6Swdw3ypycO4YhEbUEZyEGnkduzFo6EI6+2YXJwSI9O6hWrakHj1nRoxj+S nCA/o8Uwjd9GDJmeU00RHxrcE3g1mElRqMNFmSxHnUWm0uHh/vfw7HCJPL2SF3eFNdIW a7/Gv0Rx+njj9HgoAWDnY/icO0RTYnjus5s5emT9zp1A00RTey/+r6VWaiN4jTIC1ehx 5ang== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=ve63nN1miMFpoS3sFG+tWSy/YEb9gf/0jCCG/z2THU8=; fh=otInkNaWWxGa9jTskILH8UfKagcq7+MvPSlbUHMhSNU=; b=cDRMDke1xF9dIeKHV8H36u/rYIRdC8NQu+NhMDR/C1SWO2GVmbjFIMAQoiCj/WgDJR zZeD1SkFELzaUbzPEBRiJfv5eFszaTSEYq+HhWttJCKaYnllwHfvdMCV7/pPkdfU7MZb WTUYwi4Q1/3btOsur8N79Jkp7cAmNM/jS08SHsaJ0MdEagvgCZYY3T5JMK/PsB7jrawY dYi4Mh+LUyOt+tnQdAT/VJA8Fcj9LbzmHomx98JZRkzZfphTO18WFJkIimWvNjO1KFm2 neltyYq/AXmGv4Lthf79dC9ZIe5OzToqz+hXahSf71RV1zxXx3F/n1w0tGW+qH1ZPWAA uAuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N1XXB9cO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id b1-20020a170902ed0100b001b878f9e11csi1762879pld.54.2023.10.27.16.42.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Oct 2023 16:42:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N1XXB9cO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 46AEC803A501; Fri, 27 Oct 2023 16:42:11 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235153AbjJ0Xlf (ORCPT + 99 others); Fri, 27 Oct 2023 19:41:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232724AbjJ0Xle (ORCPT ); Fri, 27 Oct 2023 19:41:34 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95A851B1; Fri, 27 Oct 2023 16:41:31 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3514AC433C7; Fri, 27 Oct 2023 23:41:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1698450091; bh=+qTVFLVMHb/R1Hu+HGedVN4FJpHuyO9QibMRGHApTRw=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=N1XXB9cOfsQil87hBMTyh9s7jwPSgkGc454iCa451OG9sAH+C69zHCohnvzTWCoPv xXKi+wA9vmJtRwIPbegowsqUQQ9JRovPQNfj1ZKvflwmK/EmX6IdMDfl/+JlnSqn7I gSkqviPOc0BXeiOa2ROP9YSAq6WKQnWRY91lKF1X2yhPHiWmgso23JRGbup5GrRrB9 Pl1cVjbxqJAvZWkBJ6rh8UYwGrL66BoXMn2+2ty8mWSWTFkMgumODzU07JyJxf24sI VwhhMOwg0c70iAurFHJz6SsLR2jQY/mQASu0/VvDlRfcr9zQ4oWFJmfhTsQLDIYudo /gkQSHt/Zk5sQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id C4E9ECE0FEC; Fri, 27 Oct 2023 16:41:30 -0700 (PDT) Date: Fri, 27 Oct 2023 16:41:30 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Frederic Weisbecker , LKML , Boqun Feng , Joel Fernandes , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , Steven Rostedt , Uladzislau Rezki , rcu , Zqiang , "Liam R . Howlett" Subject: Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics Message-ID: <200c57ce-90a7-418b-9527-602dbf64231f@paulmck-laptop> Reply-To: paulmck@kernel.org References: <20231027144050.110601-1-frederic@kernel.org> <20231027144050.110601-3-frederic@kernel.org> <20231027192026.GG26550@noisy.programming.kicks-ass.net> <2a0d52a5-5c28-498a-8df7-789f020e36ed@paulmck-laptop> <20231027224628.GI26550@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231027224628.GI26550@noisy.programming.kicks-ass.net> X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 27 Oct 2023 16:42:11 -0700 (PDT) On Sat, Oct 28, 2023 at 12:46:28AM +0200, Peter Zijlstra wrote: > On Fri, Oct 27, 2023 at 02:23:56PM -0700, Paul E. McKenney wrote: > > On Fri, Oct 27, 2023 at 09:20:26PM +0200, Peter Zijlstra wrote: > > > On Fri, Oct 27, 2023 at 04:40:48PM +0200, Frederic Weisbecker wrote: > > > > > > > + /* Has the task been seen voluntarily sleeping? */ > > > > + if (!READ_ONCE(t->on_rq)) > > > > + return false; > > > > > > > - if (t != current && READ_ONCE(t->on_rq) && !is_idle_task(t)) { > > > > > > AFAICT this ->on_rq usage is outside of scheduler locks and that > > > READ_ONCE isn't going to help much. > > > > > > Obviously a pre-existing issue, and I suppose all it cares about is > > > seeing a 0 or not, irrespective of the races, but urgh.. > > > > The trick is that RCU Tasks only needs to spot a task voluntarily blocked > > once at any point in the grace period. The beginning and end of the > > grace-period process have full barriers, so if this code sees t->on_rq > > equal to zero, we know that the task was voluntarily blocked at some > > point during the grace period, as required. > > > > In theory, we could acquire a scheduler lock, but in practice this would > > cause CPU-latency problems at a certain set of large datacenters, and > > for once, not the datacenters operated by my employer. > > > > In theory, we could make separate lists of tasks that we need to wait on, > > thus avoiding the need to scan the full task list, but in practice this > > would require a synchronized linked-list operation on every voluntary > > context switch, both in and out. > > > > In theory, the task list could sharded, so that it could be scanned > > incrementally, but in practice, this is a bit non-trivial. Though this > > particular use case doesn't care about new tasks, so it could live with > > something simpler than would be required for certain types of signal > > delivery. > > > > In theory, we could place rcu_segcblist-like mid pointers into the > > task list, so that scans could restart from any mid pointer. Care is > > required because the mid pointers would likely need to be recycled as > > new tasks are added. Plus care is needed because it has been a good > > long time since I have looked at the code managing the tasks list, > > and I am probably woefully out of date on how it all works. > > > > So, is there a better way? > > Nah, this is more or less what I feared. I just worry people will come > around and put WRITE_ONCE() on the other end. I don't think that'll buy > us much. Nor do I think the current READ_ONCE()s actually matter. My friend, you trust compilers more than I ever will. ;-) > But perhaps put a comment there, that we don't care for the races and > only need to observe a 0 once or something. There are these two passagers in the big lock comment preceding the RCU Tasks code: // rcu_tasks_pregp_step(): // Invokes synchronize_rcu() in order to wait for all in-flight // t->on_rq and t->nvcsw transitions to complete. This works because // all such transitions are carried out with interrupts disabled. and: // rcu_tasks_postgp(): // Invokes synchronize_rcu() in order to ensure that all prior // t->on_rq and t->nvcsw transitions are seen by all CPUs and tasks // to have happened before the end of this RCU Tasks grace period. // Again, this works because all such transitions are carried out // with interrupts disabled. The rcu_tasks_pregp_step() function contains this comment: /* * Wait for all pre-existing t->on_rq and t->nvcsw transitions * to complete. Invoking synchronize_rcu() suffices because all * these transitions occur with interrupts disabled. Without this * synchronize_rcu(), a read-side critical section that started * before the grace period might be incorrectly seen as having * started after the grace period. * * This synchronize_rcu() also dispenses with the need for a * memory barrier on the first store to t->rcu_tasks_holdout, * as it forces the store to happen after the beginning of the * grace period. */ And the rcu_tasks_postgp() function contains this comment: /* * Because ->on_rq and ->nvcsw are not guaranteed to have a full * memory barriers prior to them in the schedule() path, memory * reordering on other CPUs could cause their RCU-tasks read-side * critical sections to extend past the end of the grace period. * However, because these ->nvcsw updates are carried out with * interrupts disabled, we can use synchronize_rcu() to force the * needed ordering on all such CPUs. * * This synchronize_rcu() also confines all ->rcu_tasks_holdout * accesses to be within the grace period, avoiding the need for * memory barriers for ->rcu_tasks_holdout accesses. * * In addition, this synchronize_rcu() waits for exiting tasks * to complete their final preempt_disable() region of execution, * cleaning up after synchronize_srcu(&tasks_rcu_exit_srcu), * enforcing the whole region before tasklist removal until * the final schedule() with TASK_DEAD state to be an RCU TASKS * read side critical section. */ Does that suffice, or should we add more? Thanx, Paul