Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1101602rdb; Wed, 6 Dec 2023 08:36:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IHyo+fmSE05tzLZDz7f29tVI+m0uK+B5ubsVzqoPHklbrYlNnSHe4KxZ4hDuYK5yYrgJEOs X-Received: by 2002:a05:6a20:748c:b0:18b:246a:d43d with SMTP id p12-20020a056a20748c00b0018b246ad43dmr1475037pzd.15.1701880577375; Wed, 06 Dec 2023 08:36:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701880577; cv=none; d=google.com; s=arc-20160816; b=mQsJCgwvgmQ+qm2fJutoN9yBqaOY8F+GvL+u+UAO6NzgqVGlqIED8gt+cnsGdFuhPx Y4s6Epf2FnvvuJ5+qVHT/lnrzHbVGF6cHIV9CdJgmV9MbEqGWG6V250WTi6WjNK1UVo+ iZd28ZjG8Ob4WmHHdtpBpEh7uvKmvoRlxb03lnVvT+kkOEkdbHc+OdKyP7SwbW1BsKsZ GzLmNpiWOh9sUJwIBRy2dT0IGDY4C+WqPuTOIVjWALjGzDWyQ2uFe1S0mO/+D0YJfCh4 Nc+WnUp/oz+bIE6UgPcRf9f+a8sscP7V7yp54JdoI9qigVsON70WG6fM2TdGCr1y5+VC 0hog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:dkim-signature:dkim-signature:date; bh=whJDSmpzGX50kYPMLNbdNdqjEVBZrB5OQX4QTVpQUhw=; fh=5cMh8CKYPGN1F0fzuaj1xrlN5+dBfa6bPK4XIprQ5s0=; b=r32oeEMFqhJjmnD/tWUGRVJI4XVRxFOZGKQ5ul3dHD65Q1KnEdd6THIVsbUAvXS6Dk Rupe0AfLY1/TGIhMuDHcxIKgdBvaxsrmWkvuqBDdvP5LLLCP/Fy8kzUSX8zMnuTFGoBT 6daGQ/ZH6t/CbYCW/BqVxmWCT5wqiLL6VI+6b4edmJmYJRE5JdXCcd4m99vnN53HzYZ3 pJ19C1V0S+L7jPQJuHEV6JbJB8cz4vjjyAxQqE5iITqZOBB7AHjANUUd2JAYoBkVpjHJ NwK/wkwRq7PeWgvJDw6OrsTWY2i3teaaFUMQTveHkoPgjUaZknFx9Z4GM1InqLop/j5Z Y4Hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=lULJtP1S; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EjkSfRV6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id t189-20020a6381c6000000b005bdfb77e552si128979pgd.439.2023.12.06.08.36.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 08:36:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=lULJtP1S; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=EjkSfRV6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id B0AB2826EDB0; Wed, 6 Dec 2023 08:36:14 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379345AbjLFQfh (ORCPT + 99 others); Wed, 6 Dec 2023 11:35:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36116 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379346AbjLFQfe (ORCPT ); Wed, 6 Dec 2023 11:35:34 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C7CCD64 for ; Wed, 6 Dec 2023 08:35:39 -0800 (PST) Date: Wed, 6 Dec 2023 17:35:36 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1701880537; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=whJDSmpzGX50kYPMLNbdNdqjEVBZrB5OQX4QTVpQUhw=; b=lULJtP1S961DyobOF5Cy7I0eUTGIDj2gi35eg/YulHBTPQamJ57ZVOiF/3IY+R2U/x2Ch0 HZismUnHwyHhYHsIRYkrsuMH6kS78hv/qDlgVdRSD9LBX5l6SSiiDLeM6WLjnscflXXfqr 2KPT3YyjC8xQjcucdQQTkd3Gj9u1weITskqeWVjkxkyhcc6ssdXOLbQjq3TAOTeWlYXBYI rsQf0LF+/xwcHFo38xSajXaTAFNA2qoGVcBkiRiQY8t+Ofbu6mXArK3SvPwFfzfZE7DAM9 BXqArlw20Ea+Mj0qV3/V//7EsRckIOVk8f16yaaqlcYo4s0mYOauVoLoHyjCBA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1701880537; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=whJDSmpzGX50kYPMLNbdNdqjEVBZrB5OQX4QTVpQUhw=; b=EjkSfRV6DcWBAYGyXeZq5BEWzbIl5YDBbyuY9MWrn6XL2eNuYn8+skPPAqnPOyXL5yuaHl X3jZr89EcBwoQCBQ== From: Sebastian Siewior To: Anna-Maria Behnsen Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , John Stultz , Thomas Gleixner , Eric Dumazet , "Rafael J . Wysocki" , Arjan van de Ven , "Paul E . McKenney" , Frederic Weisbecker , Rik van Riel , Steven Rostedt , Giovanni Gherdovich , Lukasz Luba , "Gautham R . Shenoy" , Srinivas Pandruvada , K Prateek Nayak Subject: Re: [PATCH v9 30/32] timers: Implement the hierarchical pull model Message-ID: <20231206163536.r9DcrsWQ@linutronix.de> References: <20231201092654.34614-1-anna-maria@linutronix.de> <20231201092654.34614-31-anna-maria@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20231201092654.34614-31-anna-maria@linutronix.de> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 06 Dec 2023 08:36:14 -0800 (PST) On 2023-12-01 10:26:52 [+0100], Anna-Maria Behnsen wrote: =E2=80=A6 > As long as a CPU is busy it expires both local and global timers. When a > CPU goes idle it arms for the first expiring local timer. If the first > expiring pinned (local) timer is before the first expiring movable timer, > then no action is required because the CPU will wake up before the first > movable timer expires. If the first expiring movable timer is before the > first expiring pinned (local) timer, then this timer is queued into a idle an > timerqueue and eventually expired by some other active CPU. s/some other/another ? =E2=80=A6 >=20 > Signed-off-by: Anna-Maria Behnsen > --- > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index b6c9ac0c3712..ac3e888d053f 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -2103,6 +2104,64 @@ void timer_lock_remote_bases(unsigned int cpu) =E2=80=A6 > +static void timer_use_tmigr(unsigned long basej, u64 basem, > + unsigned long *nextevt, bool *tick_stop_path, > + bool timer_base_idle, struct timer_events *tevt) > +{ > + u64 next_tmigr; > + > + if (timer_base_idle) > + next_tmigr =3D tmigr_cpu_new_timer(tevt->global); > + else if (tick_stop_path) > + next_tmigr =3D tmigr_cpu_deactivate(tevt->global); > + else > + next_tmigr =3D tmigr_quick_check(); > + > + /* > + * If the CPU is the last going idle in timer migration hierarchy, make > + * sure the CPU will wake up in time to handle remote timers. > + * next_tmigr =3D=3D KTIME_MAX if other CPUs are still active. > + */ > + if (next_tmigr < tevt->local) { > + u64 tmp; > + > + /* If we missed a tick already, force 0 delta */ > + if (next_tmigr < basem) > + next_tmigr =3D basem; > + > + tmp =3D div_u64(next_tmigr - basem, TICK_NSEC); Is this considered a hot path? Asking because u64 divs are nice if can be avoided ;) I guess the original value is from fetch_next_timer_interrupt(). But then you only need it if the caller (__get_next_timer_interrupt()) has the `idle' value set. Otherwise the operation is pointless. Would it somehow work to replace base_local->is_idle =3D time_after(nextevt, basej + 1); with maybe something like base_local->is_idle =3D tevt.local > basem + TICK_NSEC If so you could avoid the `nextevt' maneuver. > + *nextevt =3D basej + (unsigned long)tmp; > + tevt->local =3D next_tmigr; > + } > +} > +# else =E2=80=A6 > @@ -2132,6 +2190,21 @@ static inline u64 __get_next_timer_interrupt(unsig= ned long basej, u64 basem, > nextevt =3D fetch_next_timer_interrupt(basej, basem, base_local, > base_global, &tevt); > =20 > + /* > + * When the when the next event is only one jiffie ahead there is no If the next event is only one jiffy ahead then there is no > + * need to call timer migration hierarchy related > + * functions. @tevt->global will be KTIME_MAX, nevertheless if the next > + * timer is a global timer. This is also true, when the timer base is The second sentence is hard to parse. > + * idle. > + * > + * The proper timer migration hierarchy function depends on the callsite > + * and whether timer base is idle or not. @nextevt will be updated when > + * this CPU needs to handle the first timer migration hierarchy event. > + */ > + if (time_after(nextevt, basej + 1)) > + timer_use_tmigr(basej, basem, &nextevt, idle, > + base_local->is_idle, &tevt); > + > /* > * We have a fresh next event. Check whether we can forward the > * base. > diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c > new file mode 100644 > index 000000000000..05cd8f1bc45d > --- /dev/null > +++ b/kernel/time/timer_migration.c > @@ -0,0 +1,1636 @@ =E2=80=A6 > +/* > + * The timer migration mechanism is built on a hierarchy of groups. The > + * lowest level group contains CPUs, the next level groups of CPU groups > + * and so forth. The CPU groups are kept per node so for the normal case > + * lock contention won't happen across nodes. Depending on the number of > + * CPUs per node even the next level might be kept as groups of CPU grou= ps > + * per node and only the levels above cross the node topology. > + * > + * Example topology for a two node system with 24 CPUs each. > + * > + * LVL 2 [GRP2:0] > + * GRP1:0 =3D GRP1:M > + * > + * LVL 1 [GRP1:0] [GRP1:1] > + * GRP0:0 - GRP0:2 GRP0:3 - GRP0:5 > + * > + * LVL 0 [GRP0:0] [GRP0:1] [GRP0:2] [GRP0:3] [GRP0:4] [GRP0:5] > + * CPUS 0-7 8-15 16-23 24-31 32-39 40-47 In the CPUS list between 24-31 and 32-39 is a tab while the other separators are spaces. Could you please align it with spaces? Judging form the top you have tabstop=3D8 but here tabstop=3D4 looks "nice". > + * > + * The groups hold a timer queue of events sorted by expiry time. These > + * queues are updated when CPUs go in idle. When they come out of idle > + * ignore flag of events is set. > + * Sebastian