Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp1311544rwo; Wed, 2 Aug 2023 11:56:17 -0700 (PDT) X-Google-Smtp-Source: APBJJlFv+/Rz8ZwJhL2YKEvXwHLf8y5HL45X8RN3ZVzLkLMbw7MSuKPKPqaFIczNUZ1QHo9mqq/K X-Received: by 2002:aa7:c6cb:0:b0:522:29b2:e048 with SMTP id b11-20020aa7c6cb000000b0052229b2e048mr5988395eds.13.1691002577337; Wed, 02 Aug 2023 11:56:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691002577; cv=none; d=google.com; s=arc-20160816; b=A9TO04hlusLCx88V72JqzGOK2CFOHAMYF46Ff0nvBsO/sc4I66SDxjHCTJaOKa0otM Kgi1xosgF7PB1nZ0jYKd5fWUHs7vpYVbRam5kZJQU4avNJCE5kEA3EnYRZWiLCEDm5Dc BjGg1PTdMU7hKcBYHKvXkyWBlW26afKNsuv2Fy5Ag7L7rKCfSGZ0v6JyGZjX2Idp8/DW TZ0WwzAttzpHd3s7s7yKHNBwABXuLlZTcLaT7x4tndRU7nKLghJH+hCLdvSTy5g2e3QY QfdAL7XRoqjmvS3Dh5BNaRZg7/9pmXEJ+XwDeB0efutakBzFilB7kqh1+k6/cTa1Jd2d IEXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=qOYwVzKG6cKF8sGd1aszYKinVctA7J4Icp4SStnOGdQ=; fh=Pt0AKGOZdHesRf5GlAMQz6aR3YmUq5VYlDPX61brAtU=; b=UApNqLLbo7HOfso049PPRZdZx6g/ZdLFFcmmtXSZZJ3rnWCcfNJjoZ1nEd521/+oKu jUWtnFFrQiQFyXlqwJbN5nO9nXJJXDh/TqKK5e6v1AYknhu9V7SWDeLh1pQ9BOu/bXl9 YyOzXPIOUTylmyRlHPQ18ViJ2owUUWgQBnEBms19fPatpADT8zo9JVa90mw9kNU+Qwg+ F5zq+lTZ+kekkzNMo/D0oDEQrWIX+W+rA9CJfRBNCiRSTKjdoKzAcch2rEElV/jYkWVh +3oYFVPx440pp34qaeUys1781TcZSK+y1AtfikC3Z1B0ryFJkGgDAZdofEF5GHS5AlWi bpag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gQgov8TX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v5-20020aa7d9c5000000b0051e3371a8bbsi11083203eds.2.2023.08.02.11.55.53; Wed, 02 Aug 2023 11:56:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gQgov8TX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234152AbjHBRt7 (ORCPT + 99 others); Wed, 2 Aug 2023 13:49:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231936AbjHBRtm (ORCPT ); Wed, 2 Aug 2023 13:49:42 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9CA544B7; Wed, 2 Aug 2023 10:49:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BBB4B61A73; Wed, 2 Aug 2023 17:48:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 236FFC433C8; Wed, 2 Aug 2023 17:48:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690998534; bh=ec++qd+96KFIy686wWR0g6gZlFG7AhB4I5X3AHouovE=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=gQgov8TXV7aMMrG4qm3LbiCBnMZBGnIhXpvZ3e/awF1e7R8hw19JT/Urv7O3laRfT fshhhiU4UR5UXbvsf8Y81pRqirlso7KI13sNI2AXwcFemvPj9iYIX5zErlveCaFYoS 0EOExx6E60EDCMnxxO66zLQn3T5EBeoZ9rVW1dSHsThPovkvNqHPMaUS1WZwgv1IZ4 os+UAD2cg5KaUBykb4cfjfeQ9qwZFeVAUjB4Ija7ecOw61LYXJLul1oEt7sQ5ucFqI DW06aWKd2Jk3M4ba3qq0Ula1BIkilSRa+BJaBqAMchOEWVJ7EkrhOiDR76FEHS5wut m3YSwr8S2IKZw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id A8786CE0927; Wed, 2 Aug 2023 10:48:53 -0700 (PDT) Date: Wed, 2 Aug 2023 10:48:53 -0700 From: "Paul E. McKenney" To: Linus Torvalds Cc: Guenter Roeck , Peter Zijlstra , Roy Hopkins , Joel Fernandes , Pavel Machek , Greg Kroah-Hartman , stable@vger.kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, shuah@kernel.org, patches@kernelci.org, lkft-triage@lists.linaro.org, jonathanh@nvidia.com, f.fainelli@gmail.com, sudipm.mukherjee@gmail.com, srw@sladewatkins.net, rwarsow@gmx.de, conor@kernel.org, rcu@vger.kernel.org, Ingo Molnar Subject: Re: scheduler problems in -next (was: Re: [PATCH 6.4 000/227] 6.4.7-rc1 review) Message-ID: <8ab3ca72-e20c-4b18-803f-bf6937c2cd70@paulmck-laptop> Reply-To: paulmck@kernel.org References: <20230731143954.GB37820@hirez.programming.kicks-ass.net> <20230731145232.GM29590@hirez.programming.kicks-ass.net> <7ff2a2393d78275b14ff867f3af902b5d4b93ea2.camel@suse.de> <20230731161452.GA40850@hirez.programming.kicks-ass.net> <20230731211517.GA51835@hirez.programming.kicks-ass.net> <8215f037-63e9-4e92-8403-c5431ada9cc9@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 02, 2023 at 10:14:51AM -0700, Linus Torvalds wrote: > Two quick comments, both of them "this code is a bit odd" rather than > anything else. Good to get eyes on this code, so thank you very much!!! > On Tue, 1 Aug 2023 at 12:11, Paul E. McKenney wrote: > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > Why is this file called "tasks.h"? > > It's not a header file. It makes no sense. It's full of C code. It's > included in only one place. It's just _weird_. You are right, it is weird. This is a holdover from when I was much more concerned about being criticized for having #ifdef in a .c file, and pretty much every line in this file is under some combination or another of #ifdefs. This concern led to kernel/rcu/tree_plugin.h being set up in this way back when preemptible RCU was introduced, and for good or for bad I just kept following that pattern. We could convert this to a .c file, keep the #ifdefs, drop some instances of "static", add a bunch of declarations, and maybe (or maybe not) push a function or two into some .h file for performance/inlining reasons. Me, I would prefer to leave it alone, but we can certainly change it. > However, more relevantly: > > > + mutex_unlock(&rtp->tasks_gp_mutex); > > set_tasks_gp_state(rtp, RTGS_WAIT_CBS); > > Isn't the tasks_gp_mutex the thing that protects the gp state here? > Shouldn't it be after setting? Much of the gp state is protected by being accessed only by the gp kthread. But there is a window in time where the gp might be driven directly out of the synchronize_rcu_tasks() call. That window in time does not have a definite end, so this ->tasks_gp_mutex does the needed mutual exclusion during the transition of gp processing to the newly created gp kthread. > > rcuwait_wait_event(&rtp->cbs_wait, > > (needgpcb = rcu_tasks_need_gpcb(rtp)), > > TASK_IDLE); > > Also, looking at rcu_tasks_need_gpcb() that is now called outside the > lock, it does something quite odd. The state of each callback list is protected by the ->lock field of the rcu_tasks_percpu structure. Yes, rcu_segcblist_n_cbs() is invoked int rcu_tasks_need_gpcb() outside of the lock, but it is designed for lockless use. If it is modified just after the check, then there will be a later wakeup on the one hand or we will just uselessly acquire that ->lock this one time on the other. Also, ncbs records the number of callbacks seen in that first loop, then used later, where its value might be stale. This might result in a collapse back to single-callback-queue operation and a later expansion back up. Except that at this point we are still in single-CPU mode, so there should not be any lock contention, which means that there should still be but a single callback queue. The transition itself is protected by ->cbs_gbl_lock. > At the very top of the function does > > for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) { > > and 'smp_load_acquire()' is all about saying "everything *after* this > load is ordered, > > But the way it is done in that loop, it is indeed done at the > beginning of the loop, but then it's done *after* the loop too, so the > last smp_load_acquire seems a bit nonsensical. > > If you want to load a value and say "this value is now sensible for > everything that follows", I think you should load it *first*. No? > > IOW, wouldn't the whole sequence make more sense as > > dequeue_limit = smp_load_acquire(&rtp->percpu_dequeue_lim); > for (cpu = 0; cpu < dequeue_limit; cpu++) { > > and say that everything in rcu_tasks_need_gpcb() is ordered wrt the > initial limit on entry? > > I dunno. That use of "smp_load_acquire()" just seems odd. Memory > ordering is hard to understand to begin with, but then when you have > things like loops that do the same ordered load multiple times, it > goes from "hard to understand" to positively confusing. Excellent point. I am queueing that change with your Suggested-by. If testing goes well, it will be as shown below. Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 83049a893de5..94bb5abdbb37 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -432,6 +432,7 @@ static void rcu_barrier_tasks_generic(struct rcu_tasks *rtp) static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) { int cpu; + int dequeue_limit; unsigned long flags; bool gpdone = poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq); long n; @@ -439,7 +440,8 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) long ncbsnz = 0; int needgpcb = 0; - for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) { + dequeue_limit = smp_load_acquire(&rtp->percpu_dequeue_lim); + for (cpu = 0; cpu < dequeue_limit; cpu++) { struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu); /* Advance and accelerate any new callbacks. */