Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1856259pxb; Wed, 30 Mar 2022 11:20:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzFuu6Lr5Y5BaTziUYE6+lhMYgyITDkxty1GGhXwoD7W0gfR3hHAU+Cyd9v8Sop8bD7YUGn X-Received: by 2002:a05:6402:657:b0:418:d875:bf12 with SMTP id u23-20020a056402065700b00418d875bf12mr12109773edx.89.1648664414491; Wed, 30 Mar 2022 11:20:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648664414; cv=none; d=google.com; s=arc-20160816; b=btn1a4PgxpSO6EOY1oBw+q+9iOcl9xGMk3lw3ECeb9QunlSRsLrOU8HSB5RRwQxvXe 5u6ifTqafRXG6LB1hA+dLB9n3GoAsp3szPP/bhqtb/yJ1F6fh5vd4SlDQ7b6ZO/DmA9w WSlpq1vj34EJEOFQ0iBc8SAzdmwhkYg+G1wkpTNpmMjRQO368ePrw1/81GaLra6y7ecx DHYPVaDF0wr/sbYFSGoohKygkDMj49N7p6e0fP/G8gwUQIS0Dx2qoZg5I5ctXUtlFr5r BQt2YZ/00daSbcN38I2QopCE5Ljw4GCs6Npf7no6uYvjUovn1nnTau4yZFk7tGbj/Rm4 nilg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=3stubki5WriK0gpxDbPGdlbZzpfXUF504ZVAARpq4Xw=; b=mhN66FgpTnemYzVbbKsrFr12DZfnWOpvAh9OzJJ1ynh1Nei/+/Z05gPaFXy1KF3cXH 7yHi5ABgGvPEoFOemS/XHQcv0gwiyh0WDvMNTrHssBPUrIjZBIDDOfUKRvT1aXi3LAuM vdoOtBn9/ZmdloOySyTM4gOcM6pW7T1m9qIvzmPtXnGUVas32JH+bAzqKjMcKyijAar3 M7fpyBmUKvSjUM8xsYRYsvMxamFGjayc7EFWBtHRemjOjQImTTvoYAurvBUCBgxZl5gx PZInWz76B7gPm/RjdSH0gN6QReWfXQyU4YomLuegALg7m7zIxYY7mnsIAgod27gyBJBd TuKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=pa7ooveN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x23-20020a170906135700b006df76385d6dsi20137852ejb.525.2022.03.30.11.19.47; Wed, 30 Mar 2022 11:20:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=pa7ooveN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344941AbiC3QHt (ORCPT + 99 others); Wed, 30 Mar 2022 12:07:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59466 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234878AbiC3QHs (ORCPT ); Wed, 30 Mar 2022 12:07:48 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BEDC238D18 for ; Wed, 30 Mar 2022 09:06:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=3stubki5WriK0gpxDbPGdlbZzpfXUF504ZVAARpq4Xw=; b=pa7ooveNE1LYWOJqeWlWQSlzJY OePVUk86mmiaCrXLN8byLoR+/JFHHxXH0hyfCp8ovkv6BRo2bIlnGWqWJWPh0/LxTd3a6vGV4AQRX 5FqB6OQJmeHivphLyHUKImGeGksmvyr/RvSo75dMi7gv1UybgBYxQCEjgePvFBVN/+EBJ7wDgYK9U tygmdUUZhAsYM8a2NZkzKRVXgsEVjfajUjVdLoPCROuaH8/nJdlswAmb0l/U8udGokc95giBPFeQO 9hlVU+wkTLMfcRUW6XzWcNrxHLYSjEMb3PB30D4TXlL+7VCmufoXhZbUpVPGGNrge3azVAZLAHdiU yW/ynb2A==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nZapG-006FCl-5g; Wed, 30 Mar 2022 16:05:38 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id DDDF498695C; Wed, 30 Mar 2022 18:05:35 +0200 (CEST) Date: Wed, 30 Mar 2022 18:05:35 +0200 From: Peter Zijlstra To: Ingo Molnar , Steven Rostedt , vincent.guittot@linaro.org Cc: LKML , Thomas Gleixner , Sebastian Andrzej Siewior , joel@joelfernandes.org, dietmar.eggemann@arm.com, bsegall@google.com, mgorman@suse.de, bristot@redhat.com Subject: [PATCH] sched/core: Fix forceidle balancing Message-ID: <20220330160535.GN8939@worktop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Steve reported that ChromeOS encounters the forceidle balancer being ran from rt_mutex_setprio()'s balance_callback() invocation and explodes. Now, the forceidle balancer gets queued every time the idle task gets selected, set_next_task(), which is strictly too often. rt_mutex_setprio() also uses set_next_task() in the 'change' pattern: queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */ running = task_current(rq, p); /* rq->curr == p */ if (queued) dequeue_task(...); if (running) put_prev_task(...); /* change task properties */ if (queued) enqueue_task(...); if (running) set_next_task(...); However, rt_mutex_setprio() will explicitly not run this pattern on the idle task (since priority boosting the idle task is quite insane). Most other 'change' pattern users are pidhash based and would also not apply to idle. Also, the change pattern doesn't contain a __balance_callback() invocation and hence we could have an out-of-band balance-callback, which *should* trigger the WARN in rq_pin_lock() (which guards against this exact anti-pattern). So while none of that explains how this happens, it does indicate that having it in set_next_task() might not be the most robust option. Instead, explicitly queue the forceidle balancer from pick_next_task() when it does indeed result in forceidle selection. Having it here, ensures it can only be triggered under the __schedule() rq->lock instance, and hence must be ran from that context. This also happens to clean up the code a little, so win-win. Fixes: d2dfa17bc7de ("sched: Trivial forced-newidle balancer") Reported-by: Steven Rostedt Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 16 +++++++++++----- kernel/sched/idle.c | 1 - kernel/sched/sched.h | 6 ------ 3 files changed, 11 insertions(+), 12 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5752,6 +5752,8 @@ static inline struct task_struct *pick_t extern void task_vruntime_update(struct rq *rq, struct task_struct *p, bool in_fi); +static void queue_core_balance(struct rq *rq); + static struct task_struct * pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { @@ -5801,7 +5803,7 @@ pick_next_task(struct rq *rq, struct tas } rq->core_pick = NULL; - return next; + goto out; } put_prev_task_balance(rq, prev, rf); @@ -5851,7 +5853,7 @@ pick_next_task(struct rq *rq, struct tas */ WARN_ON_ONCE(fi_before); task_vruntime_update(rq, next, false); - goto done; + goto out_set_next; } } @@ -5970,8 +5972,12 @@ pick_next_task(struct rq *rq, struct tas resched_curr(rq_i); } -done: +out_set_next: set_next_task(rq, next); +out: + if (rq->core->core_forceidle_count && next == rq->idle) + queue_core_balance(rq); + return next; } @@ -6066,7 +6072,7 @@ static void sched_core_balance(struct rq static DEFINE_PER_CPU(struct callback_head, core_balance_head); -void queue_core_balance(struct rq *rq) +static void queue_core_balance(struct rq *rq) { if (!sched_core_enabled(rq)) return; --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -434,7 +434,6 @@ static void set_next_task_idle(struct rq { update_idle_core(rq); schedstat_inc(rq->sched_goidle); - queue_core_balance(rq); } #ifdef CONFIG_SMP --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1232,8 +1232,6 @@ static inline bool sched_group_cookie_ma return false; } -extern void queue_core_balance(struct rq *rq); - static inline bool sched_core_enqueued(struct task_struct *p) { return !RB_EMPTY_NODE(&p->core_node); @@ -1267,10 +1265,6 @@ static inline raw_spinlock_t *__rq_lockp return &rq->__lock; } -static inline void queue_core_balance(struct rq *rq) -{ -} - static inline bool sched_cpu_cookie_match(struct rq *rq, struct task_struct *p) { return true;