Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp4057014rdg; Wed, 18 Oct 2023 13:42:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGpS3ZqRD3i81TTdvJ58oNpDrlcwz1SphNA/mQvB7611pRoo2MmN55EjOA6EEbt/ynrYtHq X-Received: by 2002:a17:903:22c7:b0:1c3:2df4:8791 with SMTP id y7-20020a17090322c700b001c32df48791mr572737plg.27.1697661767753; Wed, 18 Oct 2023 13:42:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697661767; cv=none; d=google.com; s=arc-20160816; b=zARsODxbH9/q5HDL0PH+bSkaIklLZUq/nE4okyeqOAbCglx5ST2fc+9QPaIr393mq/ pW2mAT3ljRD7deO+2cXCh4erJvOPqjcDklPvmi5edVlAe2IUVH7GlCdvnltVpzkoLl7L bgE/8/DsxRrR2CZUSOIo4afQytB16aWv3WoCjlQu429NrPdnv4o9KuJkfDvc6gqVtbgh iHcr0WMa2JvEKZGoSz7ngdlf86OoktCPllv1Fp3HvDXBPPq2oyeJ8eD0ZV0tq/oXbpkA lOQfMrfqhZiBOATd51JYn1v+LPKPhjgaGHAaaVGXsMt1jkdfKkbUVa0GKwZW8EDB8HKP PJvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=d9PMADhpLeyQZgjGi6YdVNbHFw7Blxgb8PwK1sfMkdo=; fh=2XUgOFbj0QdoWFI8qn9KXy+bptQBuOFs9g7phIR+oiI=; b=nK/uchh0X8bjzIDwhcNw/GvaNSUoMaUQ4KczEA9dNCZ0Zsafa6j4gcUPLO4JN2Ipp9 OCvLS1Km42Gg8jX31bM37n5407eACENKIvjBxu1zyD7/zgeBkPgPfZySxYVprJfhidDO VzHNKon/OWsBxDI9nrG/3KaLq8ZRvGYVRr/ORgZqCgaoPBhdrQgCPloSpoIJG4bvwixm 8KzWy8kWSjhw24Zub+V5fb+67/5Xxw6/bE9fdhGmMBibMTfZxIbN4utV2siVMqrtBVrg +ki/Dcy6cnBd/6SsNspHI5oIytAnRTrbCQ+JxI3iAncofl2oeYqwVp3tSt2MVn7EtRqW 7mmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="j1kPeGY/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id i8-20020a17090332c800b001c9c8c4cfb6si777832plr.214.2023.10.18.13.42.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 13:42:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="j1kPeGY/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C2A878131A82; Wed, 18 Oct 2023 13:42:29 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231145AbjJRUm1 (ORCPT + 99 others); Wed, 18 Oct 2023 16:42:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229605AbjJRUmZ (ORCPT ); Wed, 18 Oct 2023 16:42:25 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8C53A4 for ; Wed, 18 Oct 2023 13:42:22 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4AC79C433C8; Wed, 18 Oct 2023 20:42:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697661742; bh=RjEGy53VoxaTJ1W6q5mGi360GS1eyr9FMldrIArMCz0=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=j1kPeGY/78hSmBnfGznQt/X0aWbRWjTb5Jo/tG0k2OJ3xWc6Bdz7hcBQmtzA6iyZI o3qjzKTMKLpNyH4Y/8apcZ7gq15w2Jix0s+XQGITfm4+u/QogDr94TL/KWL9zeD71M Oy9n+di9RLmXc+cLpR8i6Ubt5OjRuvBy+eDZynYk/Ftg5RItc95wZ6DeNSSRFasuVA 64gTJN56FCsK1JermJPfkAJ/yyEzphkCoS1LoOKoI9aoHpbClodn7CGvCPLuVPM5ly eZteM358//nFpcTn51vxpXq/2xqZWgLGpnoZd+VvNrRDMPaSIwspgPtpvGajixWRtD GpDcv8xJWVguw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id DBF3ECE0BB0; Wed, 18 Oct 2023 13:42:21 -0700 (PDT) Date: Wed, 18 Oct 2023 13:42:21 -0700 From: "Paul E. McKenney" To: Ankur Arora Cc: Thomas Gleixner , Linus Torvalds , Peter Zijlstra , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Frederic Weisbecker Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Message-ID: Reply-To: paulmck@kernel.org References: <87ttrngmq0.ffs@tglx> <87jzshhexi.ffs@tglx> <87pm1c3wbn.ffs@tglx> <61bb51f7-99ed-45bf-8c3e-f1d65137c894@paulmck-laptop> <87r0lroffj.fsf@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r0lroffj.fsf@oracle.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 18 Oct 2023 13:42:30 -0700 (PDT) On Wed, Oct 18, 2023 at 01:15:28PM -0700, Ankur Arora wrote: > > Paul E. McKenney writes: > > > On Wed, Oct 18, 2023 at 03:16:12PM +0200, Thomas Gleixner wrote: > >> Paul! > >> > >> On Tue, Oct 17 2023 at 18:03, Paul E. McKenney wrote: > >> > Belatedly calling out some RCU issues. Nothing fatal, just a > >> > (surprisingly) few adjustments that will need to be made. The key thing > >> > to note is that from RCU's viewpoint, with this change, all kernels > >> > are preemptible, though rcu_read_lock() readers remain > >> > non-preemptible. > >> > >> Why? Either I'm confused or you or both of us :) > > > > Isn't rcu_read_lock() defined as preempt_disable() and rcu_read_unlock() > > as preempt_enable() in this approach? I certainly hope so, as RCU > > priority boosting would be a most unwelcome addition to many datacenter > > workloads. > > No, in this approach, PREEMPT_AUTO selects PREEMPTION and thus > PREEMPT_RCU so rcu_read_lock/unlock() would touch the > rcu_read_lock_nesting. Which is identical to what PREEMPT_DYNAMIC does. Understood. And we need some way to build a kernel such that RCU read-side critical sections are non-preemptible. This is a hard requirement that is not going away anytime soon. > >> With this approach the kernel is by definition fully preemptible, which > >> means means rcu_read_lock() is preemptible too. That's pretty much the > >> same situation as with PREEMPT_DYNAMIC. > > > > Please, just no!!! > > > > Please note that the current use of PREEMPT_DYNAMIC with preempt=none > > avoids preempting RCU read-side critical sections. This means that the > > distro use of PREEMPT_DYNAMIC has most definitely *not* tested preemption > > of RCU readers in environments expecting no preemption. > > Ah. So, though PREEMPT_DYNAMIC with preempt=none runs with PREEMPT_RCU, > preempt=none stubs out the actual preemption via __preempt_schedule. > > Okay, I see what you are saying. More to the point, currently, you can build with CONFIG_PREEMPT_DYNAMIC=n and CONFIG_PREEMPT_NONE=y and have non-preemptible RCU read-side critical sections. > (Side issue: but this means that even for PREEMPT_DYNAMIC preempt=none, > _cond_resched() doesn't call rcu_all_qs().) I have no idea if anyone runs with CONFIG_PREEMPT_DYNAMIC=y and preempt=none. We don't do so. ;-) > >> For throughput sake this fully preemptible kernel provides a mechanism > >> to delay preemption for SCHED_OTHER tasks, i.e. instead of setting > >> NEED_RESCHED the scheduler sets NEED_RESCHED_LAZY. > >> > >> That means the preemption points in preempt_enable() and return from > >> interrupt to kernel will not see NEED_RESCHED and the tasks can run to > >> completion either to the point where they call schedule() or when they > >> return to user space. That's pretty much what PREEMPT_NONE does today. > >> > >> The difference to NONE/VOLUNTARY is that the explicit cond_resched() > >> points are not longer required because the scheduler can preempt the > >> long running task by setting NEED_RESCHED instead. > >> > >> That preemption might be suboptimal in some cases compared to > >> cond_resched(), but from my initial experimentation that's not really an > >> issue. > > > > I am not (repeat NOT) arguing for keeping cond_resched(). I am instead > > arguing that the less-preemptible variants of the kernel should continue > > to avoid preempting RCU read-side critical sections. > > [ snip ] > > >> In the end there is no CONFIG_PREEMPT_XXX anymore. The only knob > >> remaining would be CONFIG_PREEMPT_RT, which should be renamed to > >> CONFIG_RT or such as it does not really change the preemption > >> model itself. RT just reduces the preemption disabled sections with the > >> lock conversions, forced interrupt threading and some more. > > > > Again, please, no. > > > > There are situations where we still need rcu_read_lock() and > > rcu_read_unlock() to be preempt_disable() and preempt_enable(), > > repectively. Those can be cases selected only by Kconfig option, not > > available in kernels compiled with CONFIG_PREEMPT_DYNAMIC=y. > > As far as non-preemptible RCU read-side critical sections are concerned, > are the current > - PREEMPT_DYNAMIC=y, PREEMPT_RCU, preempt=none config > (rcu_read_lock/unlock() do not manipulate preempt_count, but do > stub out preempt_schedule()) > - and PREEMPT_NONE=y, TREE_RCU config (rcu_read_lock/unlock() manipulate > preempt_count)? > > roughly similar or no? No. There is still considerable exposure to preemptible-RCU code paths, for example, when current->rcu_read_unlock_special.b.blocked is set. > >> > I am sure that I am missing something, but I have not yet seen any > >> > show-stoppers. Just some needed adjustments. > >> > >> Right. If it works out as I think it can work out the main adjustments > >> are to remove a large amount of #ifdef maze and related gunk :) > > > > Just please don't remove the #ifdef gunk that is still needed! > > Always the hard part :). Hey, we wouldn't want to insult your intelligence by letting you work on too easy of a problem! ;-) Thanx, Paul