Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3997443rdh; Tue, 28 Nov 2023 09:04:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IE47avB18ZscUK8YICLuR4w0l6SNfuHhbNy0e8sIvInMDGQUzzLs9pwuc+x02OqPyAdobe/ X-Received: by 2002:a17:902:9a82:b0:1cf:c618:adc8 with SMTP id w2-20020a1709029a8200b001cfc618adc8mr10997463plp.7.1701191095341; Tue, 28 Nov 2023 09:04:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701191095; cv=none; d=google.com; s=arc-20160816; b=pXGREvQW0I6cOG3XxnwOu9wkb8kFoDj6kTUBXX4gnzBaNCq2XOGnpb4kXSixrTC2Hl zuchBlKO9n6pxIODgPdODjEOoN0pLmIZi1l1aWcFSy90uojQbXC0+9MDpmFoCm2BrX9j zDKp8/JTrtr5zVq1niCOPLNXaO2WHr2ico5kGofoeAOr424/kEEm5WRzrlguRv9/5MUC kHjRcB9l4PQ0Ax4V6beT2xfMdzDpDDrpiJFqpyn0rmXZu0lC7RdVlBu3giVlmm7fzPGv IExkWQ0vbwa2vGcNnDK0S3yitsur4eqms0b3j6viehzEYUkmk+E1LUH0laVXMHNqLgHJ cjqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=Xfko3HMIoKv9mdpGScU+7vTPH+jGNQNO1UKU6GwR9yA=; fh=66iYFQZUuQXG0Wv2hh4NpvJbeXLGqmddBIGI0wzcj6w=; b=wBiGrHDuIgc3xf+m6zbVh7zV8rx3dsopOxQcySjhXjZnv7pWNs9gLr3nJBxumHeHk/ lwdB6s7hRPvdPCiUfh5ll/TWvhjkfXTQPDYDzpX5dR2nLu1GMlbtaoA7vRNGE0IQP3lM 11IoGi90nNLiovX51FC7uf9VI/2pUdRQGbH0D3nNEaRGEESafvritP0CVQD+MZpLJmSI JfWqY3nADwR/jinBIomWlQIWK+Z2AMg9ik9zvtkchU2TrUdDWDGviCjWmyGgnXdYjP7o qC/CaTPXNKk0qLuN78q8ebFqlgLdWdwm25WRzH4fZo9rVt5lwwk187K2epVe5Z5Jdp54 wd3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=07ofSme9; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EVBWFb7X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id n19-20020a170902969300b001cfd183c4f2si4282116plp.520.2023.11.28.09.04.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 09:04:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=07ofSme9; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EVBWFb7X; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 3612480A416A; Tue, 28 Nov 2023 09:04:52 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344574AbjK1REc (ORCPT + 99 others); Tue, 28 Nov 2023 12:04:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230430AbjK1REb (ORCPT ); Tue, 28 Nov 2023 12:04:31 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE54110EC for ; Tue, 28 Nov 2023 09:04:36 -0800 (PST) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1701191074; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Xfko3HMIoKv9mdpGScU+7vTPH+jGNQNO1UKU6GwR9yA=; b=07ofSme9YUvQgryEpVhg44lrLogDCyumJ3hy4MaHGcCnBrz236kh0U9II9+lblo874CUXO te5W3gSbGGj0bWX/Lc6oGvz4jR1XwkdFKHpzBZ8GzcsRT++j/W+JswAAT/WclHpqsR8xle swMpHjkGMK6Me7zldR7DH5Ipgmu1o1zT8j8yiL/o8odJM3KK2dXaU/Kr9kDsazzWnJOe66 0SQfTWPugExYwTdUhtdaM3eRsi+QWCaF4tbvTvFxR7VvSyQkU9kef5ftEx4ys/Sj34ebz/ zSl1k0Io1mpm0SJtdEtta+sO3a32z0ZaITI944s9qyBoyWs6z2RuFEI7qxiRTQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1701191074; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Xfko3HMIoKv9mdpGScU+7vTPH+jGNQNO1UKU6GwR9yA=; b=EVBWFb7XtKdcqybnG2Hs74oRpvnl8zsjRZH0FVFTjyRJVXM0GwjBQH9y9Sn/xE3p6L5o3D AclsCwKiP2kvRyDw== To: paulmck@kernel.org, Ankur Arora Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, torvalds@linux-foundation.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, mingo@kernel.org, bristot@kernel.org, mathieu.desnoyers@efficios.com, geert@linux-m68k.org, glaubitz@physik.fu-berlin.de, anton.ivanov@cambridgegreys.com, mattst88@gmail.com, krypton@ulrich-teichert.org, rostedt@goodmis.org, David.Laight@aculab.com, richard@nod.at, mjguzik@gmail.com Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n In-Reply-To: <2027da00-273d-41cf-b9e7-460776181083@paulmck-laptop> References: <20231107215742.363031-1-ankur.a.arora@oracle.com> <20231107215742.363031-49-ankur.a.arora@oracle.com> <2027da00-273d-41cf-b9e7-460776181083@paulmck-laptop> Date: Tue, 28 Nov 2023 18:04:33 +0100 Message-ID: <87v89lzu5a.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 28 Nov 2023 09:04:52 -0800 (PST) Paul! On Mon, Nov 20 2023 at 16:38, Paul E. McKenney wrote: > But... > > Suppose we have a long-running loop in the kernel that regularly > enables preemption, but only momentarily. Then the added > rcu_flavor_sched_clock_irq() check would almost always fail, making > for extremely long grace periods. Or did I miss a change that causes > preempt_enable() to help RCU out? So first of all this is not any different from today and even with RCU_PREEMPT=y a tight loop: do { preempt_disable(); do_stuff(); preempt_enable(); } will not allow rcu_flavor_sched_clock_irq() to detect QS reliably. All it can do is to force reschedule/preemption after some time, which in turn ends up in a QS. The current NONE/VOLUNTARY models, which imply RCU_PRREMPT=n cannot do that at all because the preempt_enable() is a NOOP and there is no preemption point at return from interrupt to kernel. do { do_stuff(); } So the only thing which makes that "work" is slapping a cond_resched() into the loop: do { do_stuff(); cond_resched(); } But the whole concept behind LAZY is that the loop will always be: do { preempt_disable(); do_stuff(); preempt_enable(); } and the preempt_enable() will always be a functional preemption point. So let's look at the simple case where more than one task is runnable on a given CPU: loop() preempt_disable(); --> tick interrupt set LAZY_NEED_RESCHED preempt_enable() -> Does nothing because NEED_RESCHED is not set preempt_disable(); --> tick interrupt set NEED_RESCHED preempt_enable() preempt_schedule() schedule() report_QS() which means that on the second tick a quiesent state is reported. Whether that's really going to be a full tick which is granted that's a scheduler decision and implementation detail and not really relevant for discussing the concept. Now the problematic case is when there is only one task runnable on a given CPU because then the tick interrupt will set neither of the preemption bits. Which is fine from a scheduler perspective, but not so much from a RCU perspective. But the whole point of LAZY is to be able to enforce rescheduling at the next possible preemption point. So RCU can utilize that too: rcu_flavor_sched_clock_irq(bool user) { if (user || rcu_is_cpu_rrupt_from_idle() || !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { rcu_qs(); return; } if (this_cpu_read(rcu_data.rcu_urgent_qs)) set_need_resched(); } So: loop() preempt_disable(); --> tick interrupt rcu_flavor_sched_clock_irq() sets NEED_RESCHED preempt_enable() preempt_schedule() schedule() report_QS() See? No magic nonsense in preempt_enable(), no cond_resched(), nothing. The above rcu_flavor_sched_clock_irq() check for rcu_data.rcu_urgent_qs is not really fundamentaly different from the check in rcu_all_gs(). The main difference is that it is bound to the tick, so the detection/action might be delayed by a tick. If that turns out to be a problem, then this stuff has far more serious issues underneath. So now you might argue that for a loop like this: do { mutex_lock(); do_stuff(); mutex_unlock(); } the ideal preemption point is post mutex_unlock(), which is where someone would mindfully (*cough*) place a cond_resched(), right? So if that turns out to matter in reality and not just by academic inspection, then we are far better off to annotate such code with: do { preempt_lazy_disable(); mutex_lock(); do_stuff(); mutex_unlock(); preempt_lazy_enable(); } and let preempt_lazy_enable() evaluate the NEED_RESCHED_LAZY bit. Then rcu_flavor_sched_clock_irq(bool user) can then use a two stage approach like the scheduler: rcu_flavor_sched_clock_irq(bool user) { if (user || rcu_is_cpu_rrupt_from_idle() || !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { rcu_qs(); return; } if (this_cpu_read(rcu_data.rcu_urgent_qs)) { if (!need_resched_lazy())) set_need_resched_lazy(); else set_need_resched(); } } But for a start I would just use the trivial if (this_cpu_read(rcu_data.rcu_urgent_qs)) set_need_resched(); approach and see where this gets us. With the approach I suggested to Ankur, i.e. having PREEMPT_AUTO(or LAZY) as a config option we can work on the details of the AUTO and RCU_PREEMPT=n flavour up to the point where we are happy to get rid of the whole zoo of config options alltogether. Just insisting that RCU_PREEMPT=n requires cond_resched() and whatsoever is not really getting us anywhere. Thanks, tglx