Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2518225rdb; Fri, 8 Dec 2023 10:18:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IEMSG0CwpgjNQP794aP0azuacywx5HtJeFyF8UYNrQ39zmHVTZhMmusV3IFOMs6xIjHY5k2 X-Received: by 2002:a17:902:e84a:b0:1d0:ccd9:8572 with SMTP id t10-20020a170902e84a00b001d0ccd98572mr1857883plg.8.1702059504932; Fri, 08 Dec 2023 10:18:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702059504; cv=none; d=google.com; s=arc-20160816; b=YUhKuRPCkSUH/YSewBdSHowksJbBhUJf4qxP6SSrx0W2rlR0tbNVaCt3BK4Wcr22IX uKyskAKXVzCbwdUMuw3ZtwpHnCx6clvkiv9YK2YO+XdZFa14JVh6VKcJhnwYRCDpBlRi ajihP2Q+r6OmBJojOSQUWIMFvyJhkjPqDgTCUOjN0p41gktLx8761+GfLFOtLA8gq+SS cSipUnkRVqlxymkQyorxcSU7WG3z7NPROaiigzz1qQZDhG8IH6O1uVTirzEtIZA5yx5B mJbp0paSHEfANt+4DjDV2DYjDmLKL3OGVpg3ozdYeknGhctYpE5mVzOASE+Jq03K3HoQ r05w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:dkim-signature:dkim-signature:date; bh=4qcXcYZ78xdrELxs09VB+XM+fkJMkkTc/dXg5CeMfwc=; fh=5cMh8CKYPGN1F0fzuaj1xrlN5+dBfa6bPK4XIprQ5s0=; b=mcPleraOlDfP4aYl0Z+XutRCbI42sTMu91G4v2SkSqpFbHfyySheBkH0f2jbiHF8RJ VjgDZwIyvq9xAd27S/8s6iz3QLDewlTu8JlimWAWz1+Drl8LaAZn4/WGyOVRko6Av+lf nwhc0VCfHUxN56PO93A4v9QLmUooEaGE2voWlh0iCqrXcw+Rx48Yp/ysQvvP4Kt+kWaZ aIrit96HDJPzbUsq2d3zLqCj08KgB+ItCY+5DMjqUAge5x28nLWJ0/NGOkXDtb9Ji2wm Iy/7tV6GSOgog8e9+58rWbLVHp/hRAAPWmAODlWs6mNXCzkI6Av8/BW8WLr38YrPxF0M 2qLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=zuUccWji; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id p125-20020a634283000000b005b882238681si1904021pga.620.2023.12.08.10.18.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 10:18:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=zuUccWji; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 13D8F807E9AA; Fri, 8 Dec 2023 10:18:22 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233811AbjLHSSF (ORCPT + 99 others); Fri, 8 Dec 2023 13:18:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229811AbjLHSSE (ORCPT ); Fri, 8 Dec 2023 13:18:04 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A196590 for ; Fri, 8 Dec 2023 10:18:09 -0800 (PST) Date: Fri, 8 Dec 2023 19:18:05 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1702059487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4qcXcYZ78xdrELxs09VB+XM+fkJMkkTc/dXg5CeMfwc=; b=zuUccWji8Yvvt6S2JoRAkpCU22iQ/oG073f7BGW8OALPeQFyRsSLoweMFQcJtHMZm9Xvdp fjs0b7uym9xVfUMEtTTTsm6JkHjwFQ8Ui8hDe7GSwWkKFBzHpWn8zehHbZNk3WZYSCotCi pWYpzNoNDblvYUN4mJieR9D2SpdNHSVHR+pEvGElJhjThbd8GX45s3Gf8F5UvGb2E8vltO copfWNE+qF8vOS+x9LRSZw+HXZihvZ3FmRxhSdJZKbOU4I7B9DmrcK7wt5qGvytFSpJoJf CTqUyL7+zNNVyqyPZN5yGx24sPaURhp93Qq1+Cv8Hn6BuycwxdWGo6pDWqTj1w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1702059487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4qcXcYZ78xdrELxs09VB+XM+fkJMkkTc/dXg5CeMfwc=; b=ZAc3EytwEBi27PBPeiZ+KWkDDfOW9c+isJ6HhVsEL9b5QWuJ4/4aPwdF7Jo16Qmf2oXbp3 P7gEEcvUhsbtoABg== From: Sebastian Siewior To: Anna-Maria Behnsen Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , John Stultz , Thomas Gleixner , Eric Dumazet , "Rafael J . Wysocki" , Arjan van de Ven , "Paul E . McKenney" , Frederic Weisbecker , Rik van Riel , Steven Rostedt , Giovanni Gherdovich , Lukasz Luba , "Gautham R . Shenoy" , Srinivas Pandruvada , K Prateek Nayak Subject: Re: [PATCH v9 30/32] timers: Implement the hierarchical pull model Message-ID: <20231208181805.BbFDsoJe@linutronix.de> References: <20231201092654.34614-1-anna-maria@linutronix.de> <20231201092654.34614-31-anna-maria@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20231201092654.34614-31-anna-maria@linutronix.de> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 08 Dec 2023 10:18:22 -0800 (PST) On 2023-12-01 10:26:52 [+0100], Anna-Maria Behnsen wrote: > diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c > new file mode 100644 > index 000000000000..05cd8f1bc45d > --- /dev/null > +++ b/kernel/time/timer_migration.c > @@ -0,0 +1,1636 @@ =E2=80=A6 > + * Required event and timerqueue update after a remote expiry: > + * ----------------------------------------------------------- > + * > + * After a remote expiry of a CPU, a walk through the hierarchy updating= the > + * events and timerqueues has to be done when there is a 'new' global ti= mer of > + * the remote CPU (which is obvious) but also if there is no new global = timer, > + * but the remote CPU is still idle: After expiring timers of a remote CPU, a walk through the hierarchy and updating events timerqueues is required. It is obviously needed if there is a 'new' global timer but also if there no new global timer but the remote CPU is still idle. > + * 1. CPU2 is the migrator and does the remote expiry in GRP1:0; expiry = of > + * evt-CPU0 and evt-CPU1 are equal: CPU0 and CPU1 have both a timer expiring at the same time so both have an event enqueued in the timerqueue. CPU2 and CPU3 have no global timer pending and CPU2 is the only active CPU and also the migrator. > + * > + * LVL 1 [GRP1:0] > + * migrator =3D GRP0:1 > + * active =3D GRP0:1 > + * --> timerqueue =3D evt-GRP0:0 > + * / \ > + * LVL 0 [GRP0:0] [GRP0:1] > + * migrator =3D TMIGR_NONE migrator =3D CPU2 > + * active =3D active =3D CPU2 > + * groupevt.ignore =3D false groupevt.ignore =3D true > + * groupevt.cpu =3D CPU0 groupevt.cpu =3D > + * timerqueue =3D evt-CPU0, timerqueue =3D > + * evt-CPU1 > + * / \ / \ > + * CPUs 0 1 2 3 > + * idle idle active idle > + * > + * 2. Remove the first event of the timerqueue in GRP1:0 and expire the = timers > + * of CPU0 (see evt-GRP0:0->cpu value): CPU2 begins to expire remote timers. It starts with own group GRP0:1. GRP0:1 has nothing in ts timerqueue and continues with its parent, GRP1:0. In GRP1:0 it dequeues the first event. It looks at CPU member expires the pending timer of CPU0. > + * LVL 1 [GRP1:0] > + * migrator =3D GRP0:1 > + * active =3D GRP0:1 > + * --> timerqueue =3D > + * / \ > + * LVL 0 [GRP0:0] [GRP0:1] > + * migrator =3D TMIGR_NONE migrator =3D CPU2 > + * active =3D active =3D CPU2 > + * groupevt.ignore =3D false groupevt.ignore =3D true > + * --> groupevt.cpu =3D CPU0 groupevt.cpu =3D > + * timerqueue =3D evt-CPU0, timerqueue =3D > + * evt-CPU1 > + * / \ / \ > + * CPUs 0 1 2 3 > + * idle idle active idle > + * > + * 3. After the remote expiry CPU0 has no global timer that needs to be > + * enqueued. When skipping the walk, the global timer of CPU1 is not = handled, > + * as the group event of GRP0:0 is not updated and not enqueued into = GRP1:0. The > + * walk has to be done to update the group events and timerqueues: The work isn't over after expiring timers of CPU0. If we stop here, then CPU1's timer have not been expired and the timerqueue of GRP0:0 has still an event for CPU0 enqueued which has just been processed. So it is required to walk the hierarchy from CPU0's point of view and update it accordingly. CPU0 will be removed from the timerqueue because it has no pending timer. If CPU0 would have a timer pending then it has to expire after CPU1's first timer because all timer from this period have just been expired. Either way CPU1 will be first in GRP0:0's timerqueue and therefore set in the CPU field of the group event which is enqueued in GRP1:0's timerqueue. > + * LVL 1 [GRP1:0] > + * migrator =3D GRP0:1 > + * active =3D GRP0:1 > + * --> timerqueue =3D evt-GRP0:0 > + * / \ > + * LVL 0 [GRP0:0] [GRP0:1] > + * migrator =3D TMIGR_NONE migrator =3D CPU2 > + * active =3D active =3D CPU2 > + * groupevt.ignore =3D false groupevt.ignore =3D true > + * --> groupevt.cpu =3D CPU1 groupevt.cpu =3D > + * --> timerqueue =3D evt-CPU1 timerqueue =3D > + * / \ / \ > + * CPUs 0 1 2 3 > + * idle idle active idle > + * > + * Now CPU2 (migrator) is able to handle the timer of CPU1 as CPU2 only = scans > + * the timerqueues of GRP0:1 and GRP1:0. Now CPU2 continues step 2 at GRP1:0 and will expire the timer of CPU1. > + * The update of step 3 is valid to be skipped, when the remote CPU went= offline > + * in the meantime because an update was already done during inactive pa= th. When > + * CPU became active in the meantime, update isn't required as well, bec= ause > + * GRP0:0 is now longer idle. The hierarchy walk in step 3 can be skipped if the migrator notices that a CPU of GRP0:0 is active. The CPU will mark GRP0:0 active and take care of the group and any needed updates within the hierarchy. I skipped the "offline" part because it is not needed. Before the CPU can go offline it has first to come out of idle. While going offline it won't (probably) participate here and the remaining timer will be migrated to another CPU. > + */ =E2=80=A6 > + > +typedef bool (*up_f)(struct tmigr_group *, struct tmigr_group *, void *); > + > +static void __walk_groups(up_f up, void *data, > + struct tmigr_cpu *tmc) > +{ > + struct tmigr_group *child =3D NULL, *group =3D tmc->tmgroup; > + > + do { > + WARN_ON_ONCE(group->level >=3D tmigr_hierarchy_levels); > + > + if (up(group, child, data)) > + break; > + > + child =3D group; > + group =3D group->parent; > + } while (group); > +} > + > +static void walk_groups(up_f up, void *data, struct tmigr_cpu *tmc) > +{ > + lockdep_assert_held(&tmc->lock); > + > + __walk_groups(up, data, tmc); > +} So these two. walk_groups() uses all have tmigr_cpu::lock acquired and __walk_groups() don't. Also the `up' function passed walk_groups() has always the same data type while the data argument passed to __walk_groups() has also the same type but different compared to the former. Given the locking situation and the type of the data argument looks like walk_groups() is used for thing#1 and __walk_groups() for thing#2. Therefore it could make sense have two separate functions (instead of walk_groups() and __walk_groups()) to distinguish this. Now it is too late but maybe later I figure out why the one type requires locking and the other doesn't. =E2=80=A6 > +/* > + * Return the next event which is already expired of the group timerqueue > + * > + * Event, which is returned, is also removed from the queue. > + */ > +static struct tmigr_event *tmigr_next_expired_groupevt(struct tmigr_grou= p *group, > + u64 now) > +{ > + struct tmigr_event *evt =3D tmigr_next_groupevt(group); > + > + if (!evt || now < evt->nextevt.expires) > + return NULL; > + > + /* > + * The event is already expired. Remove it. If it's not the last event, > + * then update all group event related information. > + */ The event expired, remove it. Update group's next expire time. > + if (timerqueue_del(&group->events, &evt->nextevt)) > + tmigr_next_groupevt(group); > + else > + WRITE_ONCE(group->next_expiry, KTIME_MAX); And then you can invoke tmigr_next_groupevt() unconditionally. > + return evt; > +} > + Sebastian