Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp904842rdb; Fri, 1 Dec 2023 01:28:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IFXzXSuYeQHLvc41khUUy0H6D3cvSZCjXeldRV+cOW+OgG/GbaRPlJVJVaD/JeKlfJE3mwA X-Received: by 2002:a05:6a20:840e:b0:18d:2748:ea22 with SMTP id c14-20020a056a20840e00b0018d2748ea22mr7437416pzd.17.1701422928439; Fri, 01 Dec 2023 01:28:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701422928; cv=none; d=google.com; s=arc-20160816; b=GyWteiFj2l6yxR1AXClzW0zpjqsvwK016H5efW0oS6HSDFaaUQ1+5CzMgnf/EV63IG DdXZ2UlnHovBkhudAzcPQEHsv3FTQealOV7IOfFZRi0NHAeDw5iAoVuaGK64rHqXF1OE RxGmHGea2K5xFO0SQ8f7F739lnPIt/BR9k8FFCHRc4w8HSiMkmAHHIirMrcHSynpY7/5 e1nu+E1GCOFkadqC2hshlqkSoI0PfgnaZSOXH9l7tLFS5cS+R00MD1M013bHYU1VWA0W 2HIDKu+WYCCif2Ov+mUNnOlgo/WYBqtS+JtlTVQCaW1vrQC3vkoXSq/VMDhxJ7k+JciL 6AmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=+2gcknKks11qfAlzDRSwTUEgmadWT0lIiDr7uy6c0kk=; fh=Qsvhj/y6vhUDpzzDJgUZ1ZrqVjoqsWAHfqtsSs/uzrU=; b=TfHngNN4Br+6NYCriBcn8JBmv9pf1XB2qX4TWWuMCBSwquAPX4KjLWuV4pskiQPDsO LfSQ0wbSktMh3TOd8eq6Xw/0fy+4OAdUC3NcurFgTuruLzuEE1CReZpJmqyU5R7q08Wg hdZKKq0XEBm1lA52R9GloqthXhyuF47F7w5rUp4S9u/ENTjLZ2cr0JKpVEtqIXfjO/jR UOikzrrxoHHHdel8Mod/uyC/qki3fS0Au66jr5ikr6jjC9VSahLXEkQaHBfrk9nwRbM/ 9QUskON+0GiEiyoIsmPb64u453MValUPZ9SEvOke8NmHu2ho7l97H6NkMt3MsvRz8ktq 0tJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=LEMhqb0+; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id ch4-20020a056a0208c400b005b90972529dsi3314402pgb.456.2023.12.01.01.28.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 01:28:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=LEMhqb0+; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 0CDFB80ECF30; Fri, 1 Dec 2023 01:28:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378175AbjLAJ2d (ORCPT + 99 others); Fri, 1 Dec 2023 04:28:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378166AbjLAJ16 (ORCPT ); Fri, 1 Dec 2023 04:27:58 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C2841BD1 for ; Fri, 1 Dec 2023 01:27:23 -0800 (PST) From: Anna-Maria Behnsen DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1701422842; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+2gcknKks11qfAlzDRSwTUEgmadWT0lIiDr7uy6c0kk=; b=LEMhqb0+RZ9Gahc8ukopEQMLtEPRI62P0m7xNOrYz0rhqfu+vWZo3RG2EukSZ+b4aILNJN Yt7OSurt0uLEc1ScjlC+583KJOOXDkZpgyuITxu/0HE7H6ZY1wdl4hIwJD8jLzv8LFe797 vrWwKxDyq9SNTbLfK4c8A+vfwHQEtEhwfgCGWHf8LcG3R84VSm93dQpSpDEBmc04ResOKE NGiZCg3KbbEMTPf+wYbgUa4qbpDoI4P5Ib/CjgYZ/aBidwJm1jF6MfNV1NiAPAoaOrO2eO Wx3EAMeQ+zJ6mrqwRL8qQW47qAPRiJ7HwhISN9FHOsnhCS5FmvfsfEhQTc/baA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1701422842; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+2gcknKks11qfAlzDRSwTUEgmadWT0lIiDr7uy6c0kk=; b=M+AlmYf4Cp0yyTXzsnGSUkZHOjn1+YGbV8GtxZbglGJrFRSQu0jnF674YBl3aeatoLgQQR V1wLFUJDfAwiRHBQ== To: linux-kernel@vger.kernel.org Cc: Peter Zijlstra , John Stultz , Thomas Gleixner , Eric Dumazet , "Rafael J . Wysocki" , Arjan van de Ven , "Paul E . McKenney" , Frederic Weisbecker , Rik van Riel , Steven Rostedt , Sebastian Siewior , Giovanni Gherdovich , Lukasz Luba , "Gautham R . Shenoy" , Srinivas Pandruvada , K Prateek Nayak , Anna-Maria Behnsen , Richard Cochran Subject: [PATCH v9 22/32] timers: Keep the pinned timers separate from the others Date: Fri, 1 Dec 2023 10:26:44 +0100 Message-Id: <20231201092654.34614-23-anna-maria@linutronix.de> In-Reply-To: <20231201092654.34614-1-anna-maria@linutronix.de> References: <20231201092654.34614-1-anna-maria@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 01 Dec 2023 01:28:47 -0800 (PST) Separate the storage space for pinned timers. Deferrable timers (doesn't matter if pinned or non pinned) are still enqueued into their own base. This is preparatory work for changing the NOHZ timer placement from a push at enqueue time to a pull at expiry time model. Originally-by: Richard Cochran (linutronix GmbH) Signed-off-by: Anna-Maria Behnsen Reviewed-by: Frederic Weisbecker --- v9: - Update was required (change of preceding patches) v6: - Drop set TIMER_PINNED flag in add_timer_on() and drop related warning. add_timer_on() fix is splitted into a separate patch. Therefore also drop "Reviewed-by" of Frederic Weisbecker v5: - Add WARN_ONCE() in add_timer_on() - Decrease patch size by splitting into three patches (this patch and the two before) v4: - split out logic to forward base clock into a helper function forward_base_clk() (Frederic) - ease the code in run_local_timers() and timer_clear_idle() (Frederic) --- kernel/time/timer.c | 95 ++++++++++++++++++++++++++++++++------------- 1 file changed, 68 insertions(+), 27 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index eda4972ca862..366ea26ce3ba 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -187,12 +187,18 @@ EXPORT_SYMBOL(jiffies_64); #define WHEEL_SIZE (LVL_SIZE * LVL_DEPTH) #ifdef CONFIG_NO_HZ_COMMON -# define NR_BASES 2 -# define BASE_STD 0 -# define BASE_DEF 1 +/* + * If multiple bases need to be locked, use the base ordering for lock + * nesting, i.e. lowest number first. + */ +# define NR_BASES 3 +# define BASE_LOCAL 0 +# define BASE_GLOBAL 1 +# define BASE_DEF 2 #else # define NR_BASES 1 -# define BASE_STD 0 +# define BASE_LOCAL 0 +# define BASE_GLOBAL 0 # define BASE_DEF 0 #endif @@ -899,7 +905,10 @@ static int detach_if_pending(struct timer_list *timer, struct timer_base *base, static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu) { - struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu); + int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL; + struct timer_base *base; + + base = per_cpu_ptr(&timer_bases[index], cpu); /* * If the timer is deferrable and NO_HZ_COMMON is set then we need @@ -912,7 +921,10 @@ static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu) static inline struct timer_base *get_timer_this_cpu_base(u32 tflags) { - struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL; + struct timer_base *base; + + base = this_cpu_ptr(&timer_bases[index]); /* * If the timer is deferrable and NO_HZ_COMMON is set then we need @@ -1971,9 +1983,10 @@ static unsigned long next_timer_interrupt(struct timer_base *base, static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem, bool *idle) { - struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + unsigned long nextevt, nextevt_local, nextevt_global; + struct timer_base *base_local, *base_global; u64 expires = KTIME_MAX; - unsigned long nextevt; + bool local_first; /* * Pretend that there is no timer pending if the cpu is offline. @@ -1985,10 +1998,31 @@ static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem, return expires; } - raw_spin_lock(&base->lock); - nextevt = next_timer_interrupt(base, basej); + base_local = this_cpu_ptr(&timer_bases[BASE_LOCAL]); + base_global = this_cpu_ptr(&timer_bases[BASE_GLOBAL]); + + raw_spin_lock(&base_local->lock); + raw_spin_lock_nested(&base_global->lock, SINGLE_DEPTH_NESTING); + + nextevt_local = next_timer_interrupt(base_local, basej); + nextevt_global = next_timer_interrupt(base_global, basej); - if (base->timers_pending) { + /* + * Check whether the local event is expiring before or at the same + * time as the global event. + * + * Note, that nextevt_global and nextevt_local might be based on + * different base->clk values. So it's not guaranteed that + * comparing with empty bases results in a correct local_first. + */ + if (base_local->timers_pending && base_global->timers_pending) + local_first = time_before_eq(nextevt_local, nextevt_global); + else + local_first = base_local->timers_pending; + + nextevt = local_first ? nextevt_local : nextevt_global; + + if (base_local->timers_pending || base_global->timers_pending) { /* If we missed a tick already, force 0 delta */ if (time_before(nextevt, basej)) nextevt = basej; @@ -1999,28 +2033,33 @@ static inline u64 __get_next_timer_interrupt(unsigned long basej, u64 basem, * We have a fresh next event. Check whether we can forward the * base. */ - __forward_timer_base(base, basej); + __forward_timer_base(base_local, basej); + __forward_timer_base(base_global, basej); /* * Set base->is_idle only when caller is timer_base_try_to_set_idle() */ if (idle) { /* - * Base is idle if the next event is more than a tick away. + * Bases are idle if the next event is more than a tick away. * * If the base is marked idle then any timer add operation must * forward the base clk itself to keep granularity small. This - * idle logic is only maintained for the BASE_STD base, - * deferrable timers may still see large granularity skew (by - * design). + * idle logic is only maintained for the BASE_LOCAL and + * BASE_GLOBAL base, deferrable timers may still see large + * granularity skew (by design). */ - if (!base->is_idle) - base->is_idle = time_after(nextevt, basej + 1); - *idle = base->is_idle; - trace_timer_base_idle(base->is_idle, base->cpu); + if (!base_local->is_idle) { + bool is_idle = time_after(nextevt, basej + 1); + + base_local->is_idle = base_global->is_idle = is_idle; + } + *idle = base_local->is_idle; + trace_timer_base_idle(base_local->is_idle, base_local->cpu); } - raw_spin_unlock(&base->lock); + raw_spin_unlock(&base_global->lock); + raw_spin_unlock(&base_local->lock); return cmp_next_hrtimer_event(basem, expires); } @@ -2064,15 +2103,15 @@ u64 timer_base_try_to_set_idle(unsigned long basej, u64 basem, bool *idle) */ void timer_clear_idle(void) { - struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); - /* * We do this unlocked. The worst outcome is a remote enqueue sending * a pointless IPI, but taking the lock would just make the window for * sending the IPI a few instructions smaller for the cost of taking * the lock in the exit from idle path. */ - base->is_idle = false; + __this_cpu_write(timer_bases[BASE_LOCAL].is_idle, false); + __this_cpu_write(timer_bases[BASE_GLOBAL].is_idle, false); + trace_timer_base_idle(0, smp_processor_id()); } #endif @@ -2123,11 +2162,13 @@ static inline void __run_timers(struct timer_base *base) */ static __latent_entropy void run_timer_softirq(struct softirq_action *h) { - struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]); __run_timers(base); - if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) + if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) { + __run_timers(this_cpu_ptr(&timer_bases[BASE_GLOBAL])); __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF])); + } } /* @@ -2135,7 +2176,7 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h) */ static void run_local_timers(void) { - struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_LOCAL]); hrtimer_run_queues(); -- 2.39.2