Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4558083rdb; Tue, 12 Dec 2023 03:07:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IH91g1gp7R7NmNhew4ZB9yBLUu6+LBjAntMraZcy6Zb8sL2EpWe/MRFSbb38ttya1OjoAd0 X-Received: by 2002:a05:6a00:2354:b0:6cd:e2c2:13d7 with SMTP id j20-20020a056a00235400b006cde2c213d7mr7726388pfj.23.1702379222566; Tue, 12 Dec 2023 03:07:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702379222; cv=none; d=google.com; s=arc-20160816; b=cTkggevvF+SlZGEywOd7jXb6wE9fKLyKeKV9DGEL44PBG522c4ixMhpVOvupGyn2jn xfe/DbO3lu/IIdm/tu+E/5Jqnbrwf/riWw4jYZiBhGrhjkqwBckt0vCShbRbk+e7emV+ 3eiLtNhOzUDAmBv/z+cw3JozLmyM9yC4qZsDUit/CuYg7Y/lLhCuDPa4SwjtCkBo7qtO vUu3+YeDR8tYqaQsCaC0pjTmaRxoGc6STD/Pxs40wk37oXkvfd15K5cgOScH4PdP59EG p65DSWb/r4D+o2Y1qeJtJApj4hu7s0I8C2hl4Unzlu6Be4e2rZs7BcQ5zr+ttZ0pTDG7 ZV6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=dNxF7QXliSE92DNXgeY61a9nlDcm8BgNVtHOlvR8QXg=; fh=g0tnFSmrxZDKsKJO/Yqin4RO5NOVm5YP05hqJT9t0yg=; b=yugqY7C6uNPRsMVM2fNqxbnm66D++eG/RKEk0urKskfuS1Kuer7uoclc88GPoDNI6v O9DiVAOSJNrIM1PXIRvh050zNPcbd5nmHDWwC8EtZYwN9v6XfQcpTzO7fH+55l0s5alJ cOxJ0mrUUL/iQN+clwQnM7EKhqUdFTL8TkLLprgQKY2geTzC1PZWmirMtFCKF5hExJne WBdxY3q/7KwLPkFRvkHBLpQRe3cKtsKzmdzHNXNGKfnp26qoc1GV0vpop6JvCilkaVTk XnJBK1/vkuOULKmDVwd1MyPFDSvegp0rZTCO3ma0q6hSmcwPCKWr3yjjBxZFulMqLFb3 Ozdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=lskCaD5x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id h34-20020a63f922000000b0058974d3c296si7138805pgi.815.2023.12.12.03.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 03:07:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=lskCaD5x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 7C962808BD2F; Tue, 12 Dec 2023 03:06:59 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231543AbjLLLGm (ORCPT + 99 others); Tue, 12 Dec 2023 06:06:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229775AbjLLLGl (ORCPT ); Tue, 12 Dec 2023 06:06:41 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0209AA0 for ; Tue, 12 Dec 2023 03:06:47 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id 5614622812f47-3b9f727d94cso805818b6e.1 for ; Tue, 12 Dec 2023 03:06:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1702379206; x=1702984006; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dNxF7QXliSE92DNXgeY61a9nlDcm8BgNVtHOlvR8QXg=; b=lskCaD5xq9Kt2OFx1ULEtynGD+FaRQ5Dfv5sSKaR+fhuTWSyhs1vXWszeDjVNjnULL C5xWFRzXyjvP/AXIKlebUJhSPQ8+VO6uu/jJLLnUs4SrYtOdvWFV8hn86f2jCRUBBK5Z T3Ixrf+7iGVxVgJyOwcIh9oe7ZGCEqjmzpwwLN3ZEOqMEhqZkNCwuYFvuYLounVK20rH BweM7NZMjjxO7uApMVnL1xEVvc2VfW4E7J5SD8nsfYDhOaUzrc9kL48itMg+wIx3WFkj X9EvsChxHNoo8eWwjhfrCnioNXxLtmsjFI6xPm6uMVmWeSPpsFiWPbH4ksihtzwPwcAa GRrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702379206; x=1702984006; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dNxF7QXliSE92DNXgeY61a9nlDcm8BgNVtHOlvR8QXg=; b=qps7uHZllepS2E2S0O0o8RymkNGFqr3OvsMaDS1f49oEITCSQ7iCOSpHWdsW241ikC V0IQejaOH72imDveuyI8FM5hFvLzVQP8phy2R/vce75+EVhM0ProxucLnER7BPYo4AvN 8Q95LJsq1xr/Lml2mNFBxw3ASjrMnArbhbNmch+s3xEtQaB8w7NSzfYzpdL9+fcPzt6v qmwvbEHikUeKgW3x+nPK0hV2zyf3G2C2yON143xxBbb9PVfmgvlCJTkOKXJqhTSYtKof nqWp8/10WNe8QKEBPiGnLCRkmKdDFe1iH25G0NMJ7VS4zWNrpGTgbfvehKRRpT4245OO PD2Q== X-Gm-Message-State: AOJu0YzXTnU2+OljV6DrsNVtmkgjPtmdMZ/vv73hErRC3NawRT//e0wf 1eQjL6r4jL7rSaNkduDWI4cJC6ginlSiCQK+syQixw== X-Received: by 2002:a05:6808:6506:b0:3ba:b1f:f468 with SMTP id fm6-20020a056808650600b003ba0b1ff468mr2183045oib.103.1702379206210; Tue, 12 Dec 2023 03:06:46 -0800 (PST) MIME-Version: 1.0 References: <20231208015242.385103-1-qyousef@layalina.io> <20231208015242.385103-2-qyousef@layalina.io> In-Reply-To: <20231208015242.385103-2-qyousef@layalina.io> From: Vincent Guittot Date: Tue, 12 Dec 2023 12:06:33 +0100 Message-ID: Subject: Re: [PATCH 1/4] sched/fair: Be less aggressive in calling cpufreq_update_util() To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , "Rafael J. Wysocki" , Viresh Kumar , Dietmar Eggemann , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Lukasz Luba , Wei Wang , Rick Yiu , Chung-Kai Mei , Hongyan Xia Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 12 Dec 2023 03:06:59 -0800 (PST) On Fri, 8 Dec 2023 at 02:52, Qais Yousef wrote: > > Due to the way code is structured, it makes a lot of sense to trigger > cpufreq_update_util() from update_load_avg(). But this is too aggressive > as in most cases we are iterating through entities in a loop to > update_load_avg() in the hierarchy. So we end up sending too many > request in an loop as we're updating the hierarchy. > > Combine this with the rate limit in schedutil, we could end up > prematurely send up a wrong frequency update before we have actually > updated all entities appropriately. > > Be smarter about it by limiting the trigger to perform frequency updates > after all accounting logic has done. This ended up being in the > following points: > > 1. enqueue/dequeue_task_fair() > 2. throttle/unthrottle_cfs_rq() > 3. attach/detach_task_cfs_rq() > 4. task_tick_fair() > 5. __sched_group_set_shares() > > This is not 100% ideal still due to other limitations that might be > a bit harder to handle. Namely we can end up with premature update > request in the following situations: > > a. Simultaneous task enqueue on the CPU where 2nd task is bigger and > requires higher freq. The trigger to cpufreq_update_util() by the > first task will lead to dropping the 2nd request until tick. Or > another CPU in the same policy trigger a freq update. > > b. CPUs sharing a policy can end up with the same race in a but the > simultaneous enqueue happens on different CPUs in the same policy. > > The above though are limitations in the governor/hardware, and from > scheduler point of view at least that's the best we can do. The > governor might consider smarter logic to aggregate near simultaneous > request and honour the higher one. > > Signed-off-by: Qais Yousef (Google) > --- > kernel/sched/fair.c | 55 ++++++++++++--------------------------------- > 1 file changed, 14 insertions(+), 41 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index b83448be3f79..f99910fc6705 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3997,29 +3997,6 @@ static inline void update_cfs_group(struct sched_entity *se) > } > #endif /* CONFIG_FAIR_GROUP_SCHED */ > > -static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags) > -{ > - struct rq *rq = rq_of(cfs_rq); > - > - if (&rq->cfs == cfs_rq) { > - /* > - * There are a few boundary cases this might miss but it should > - * get called often enough that that should (hopefully) not be > - * a real problem. > - * > - * It will not get called when we go idle, because the idle > - * thread is a different class (!fair), nor will the utilization > - * number include things like RT tasks. > - * > - * As is, the util number is not freq-invariant (we'd have to > - * implement arch_scale_freq_capacity() for that). > - * > - * See cpu_util_cfs(). > - */ > - cpufreq_update_util(rq, flags); > - } > -} > - > #ifdef CONFIG_SMP > static inline bool load_avg_is_decayed(struct sched_avg *sa) > { > @@ -4648,8 +4625,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s > > add_tg_cfs_propagate(cfs_rq, se->avg.load_sum); > > - cfs_rq_util_change(cfs_rq, 0); > - > trace_pelt_cfs_tp(cfs_rq); > } > > @@ -4678,8 +4653,6 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s > > add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum); > > - cfs_rq_util_change(cfs_rq, 0); > - > trace_pelt_cfs_tp(cfs_rq); > } > > @@ -4726,11 +4699,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s > */ > detach_entity_load_avg(cfs_rq, se); > update_tg_load_avg(cfs_rq); > - } else if (decayed) { > - cfs_rq_util_change(cfs_rq, 0); > - > - if (flags & UPDATE_TG) > - update_tg_load_avg(cfs_rq); > + } else if (decayed && (flags & UPDATE_TG)) { > + update_tg_load_avg(cfs_rq); > } > } > > @@ -5114,7 +5084,6 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) > > static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int not_used1) > { > - cfs_rq_util_change(cfs_rq, 0); > } > > static inline void remove_entity_load_avg(struct sched_entity *se) {} > @@ -5807,6 +5776,8 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) > sub_nr_running(rq, task_delta); > > done: > + cpufreq_update_util(rq, 0); > + > /* > * Note: distribution will already see us throttled via the > * throttled-list. rq->lock protects completion. > @@ -5899,6 +5870,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) > unthrottle_throttle: > assert_list_leaf_cfs_rq(rq); > > + cpufreq_update_util(rq, 0); > + > /* Determine whether we need to wake up potentially idle CPU: */ > if (rq->curr == rq->idle && rq->cfs.nr_running) > resched_curr(rq); > @@ -6704,14 +6677,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > */ > util_est_enqueue(&rq->cfs, p); > > - /* > - * If in_iowait is set, the code below may not trigger any cpufreq > - * utilization updates, so do it here explicitly with the IOWAIT flag > - * passed. > - */ > - if (p->in_iowait) > - cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT); > - > for_each_sched_entity(se) { > if (se->on_rq) > break; > @@ -6772,6 +6737,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > enqueue_throttle: > assert_list_leaf_cfs_rq(rq); > Here and in the other places below, you lose : - } else if (decayed) { The decayed condition ensures a rate limit (~1ms) in the number of calls to cpufreq_update_util. enqueue/dequeue/tick don't create any sudden change in the PELT signals that would require to update cpufreq of the change unlike attach/detach > + cpufreq_update_util(rq, p->in_iowait ? SCHED_CPUFREQ_IOWAIT : 0); > + > hrtick_update(rq); > } > > @@ -6849,6 +6816,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) > > dequeue_throttle: > util_est_update(&rq->cfs, p, task_sleep); > + cpufreq_update_util(rq, 0); > hrtick_update(rq); > } > > @@ -8482,6 +8450,7 @@ done: __maybe_unused; > > update_misfit_status(p, rq); > sched_fair_update_stop_tick(rq, p); > + cpufreq_update_util(rq, 0); > > return p; > > @@ -12615,6 +12584,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued) > > update_misfit_status(curr, rq); > update_overutilized_status(task_rq(curr)); > + cpufreq_update_util(rq, 0); > > task_tick_core(rq, curr); > } > @@ -12739,6 +12709,7 @@ static void detach_task_cfs_rq(struct task_struct *p) > struct sched_entity *se = &p->se; > > detach_entity_cfs_rq(se); > + cpufreq_update_util(task_rq(p), 0); > } > > static void attach_task_cfs_rq(struct task_struct *p) > @@ -12746,6 +12717,7 @@ static void attach_task_cfs_rq(struct task_struct *p) > struct sched_entity *se = &p->se; > > attach_entity_cfs_rq(se); > + cpufreq_update_util(task_rq(p), 0); > } > > static void switched_from_fair(struct rq *rq, struct task_struct *p) > @@ -12991,6 +12963,7 @@ static int __sched_group_set_shares(struct task_group *tg, unsigned long shares) > update_load_avg(cfs_rq_of(se), se, UPDATE_TG); > update_cfs_group(se); > } > + cpufreq_update_util(rq, 0); > rq_unlock_irqrestore(rq, &rf); > } > > -- > 2.34.1 >