Received: by 2002:a05:7412:f690:b0:e2:908c:2ebd with SMTP id ej16csp900246rdb; Fri, 20 Oct 2023 02:48:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH4CvjauUEXHF/d/qAfzzJdHqbyJ5Ie9aLRm7DYNcTpSzQibJWtNZeIfY20VxTkW1q5jBU/ X-Received: by 2002:a17:90b:3c4e:b0:27d:2a92:89b2 with SMTP id pm14-20020a17090b3c4e00b0027d2a9289b2mr1221658pjb.33.1697795326500; Fri, 20 Oct 2023 02:48:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697795326; cv=none; d=google.com; s=arc-20160816; b=ZXEJ6NdTZYn+VTA5bFonsll9to/1OtR0n0RVK4XUCUUz0xL/ziJ0jp9Ids/jmapbSi 4zJ/B8LJh4opac0ddO3M7MYHie4imIBkwWSMlSG2ErKXr6I25uq0kYKVD9+j9I3q30I9 goxv/Dq8HoMwEaTwFfgZFMry+zXf/Dvg4OlYdcQRPeUtjFgE+smGYqZyHf3njTtX3qtu oCSw8BIVcQXmR8cQHBH9V/roAFBMjH7aAxx+64cB7jJ35tscIFk90EYt4P4ZTOK6la40 t0+DqFoB8o4RcHU7cqDOwsoKOvWm3BHwP0bd0HmghjSOLd8Xout/l5sFjVQ5g8T7T7Sk tTGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=h9lArGYeVOyWS3qWJaueYmgEkzcRiu96Fegzi0dC6/U=; fh=8b7C4S+rL5Ek7BoBocX87NRKvBvApLpnkZB1Y1/cBsU=; b=Rs2qOaomuxeigEtXp+Z1+CRtU79a/2zuBIhS6i6cCc9c1ZawsUBtrbImPDtLx0/9sN eMsFARk2h3GDI5s8L5457yOePHh6K50qBNaJ44RK8CBFBF4VLDlChCofgXU7b6IX9qw/ n6L7AjKq2sX9jpke6eq5je17rIXrVdBcp2/ysWRTu+akCm5LP1JEl+mbS+PPkd2bFqye 2NSMDAPyFdP41PkWBi+ChtL4vXaAtm/hq+B2dlB5VKiM0SaELAvv7ajv2rlrOiYu/ddo qWDU9QmVSceWZT3OYMaEIgkXlLTqsgdeQ3ykB92Lz91NeSh9EYlywH010um+SCNis1qF JBKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id gp19-20020a17090adf1300b0027d0db5c235si4071089pjb.166.2023.10.20.02.48.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 02:48:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 3B3448260C31; Fri, 20 Oct 2023 02:48:43 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376670AbjJTJsd (ORCPT + 99 others); Fri, 20 Oct 2023 05:48:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376319AbjJTJsc (ORCPT ); Fri, 20 Oct 2023 05:48:32 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E5FA011B; Fri, 20 Oct 2023 02:48:29 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6B8D72F4; Fri, 20 Oct 2023 02:49:10 -0700 (PDT) Received: from [192.168.178.106] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 61A7B3F5A1; Fri, 20 Oct 2023 02:48:27 -0700 (PDT) Message-ID: Date: Fri, 20 Oct 2023 11:48:25 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] sched/schedutil: rework performance estimation Content-Language: en-US To: Vincent Guittot , mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, rafael@kernel.org, viresh.kumar@linaro.org, qyousef@layalina.io, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: lukasz.luba@arm.com References: <20231013151450.257891-1-vincent.guittot@linaro.org> <20231013151450.257891-2-vincent.guittot@linaro.org> From: Dietmar Eggemann In-Reply-To: <20231013151450.257891-2-vincent.guittot@linaro.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 20 Oct 2023 02:48:43 -0700 (PDT) On 13/10/2023 17:14, Vincent Guittot wrote: > The current method to take into account uclamp hints when estimating the > target frequency can end into situation where the selected target > frequency is finally higher than uclamp hints whereas there are no real > needs. Such cases mainly happen because we are currently mixing the > traditional scheduler utilization signal with the uclamp performance > hints. By adding these 2 metrics, we loose an important information when > it comes to select the target frequency and we have to make some > assumptions which can't fit all cases. > > Rework the interface between the scheduler and schedutil governor in order > to propagate all information down to the cpufreq governor. So we change from: max(util -> uclamp, iowait_boost -> uclamp) -> head_room() to: util = max(util, iowait_boost) -> util = head_room(util) _min = max(irq + cpu_bw_dl, uclamp_min) -> -> max(_min, _max) _max = min(scale, uclamp_max) -> _max = min(util, _max) > effective_cpu_util() interface changes and now returns the actual > utilization of the CPU with 2 optional inputs: > - The minimum performance for this CPU; typically the capacity to handle > the deadline task and the interrupt pressure. But also uclamp_min > request when available. > - The maximum targeting performance for this CPU which reflects the > maximum level that we would like to not exceed. By default it will be > the CPU capacity but can be reduced because of some performance hints > set with uclamp. The value can be lower than actual utilization and/or > min performance level. > > A new sugov_effective_cpu_perf() interface is also available to compute > the final performance level that is targeted for the CPU after applying > some cpufreq headroom and taking into account all inputs. > > With these 2 functions, schedutil is now able to decide when it must go > above uclamp hints. It now also have a generic way to get the min > perfromance level. > > The dependency between energy model and cpufreq governor and its headroom > policy doesn't exist anymore. But the dependency that both are doing the same thing still exists, right? sugov_get_util() and eenv_pd_max_util() are calling the same functions: util = effective_cpu_util(cpu, util, &min, &max) /* ioboost, bw_min = head_room(min) resp. uclamp tsk handling */ util = sugov_effective_cpu_perf(cpu, util, min, max) [...] > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index a3f9cd52eec5..78228abd1219 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -7381,18 +7381,13 @@ int sched_core_idle_cpu(int cpu) > * required to meet deadlines. > */ > unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > - enum cpu_util_type type, > - struct task_struct *p) > + unsigned long *min, > + unsigned long *max) FREQUENCY_UTIL relates to *min != NULL and *max != NULL ENERGY_UTIL relates to *min == NULL and *max == NULL so both must be either NULL or !NULL. Calling it with one equa NULL and the other with !NULL should be undefined, right? [...] > @@ -7400,45 +7395,36 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > * update_irq_load_avg(). > */ > irq = cpu_util_irq(rq); > - if (unlikely(irq >= max)) > - return max; > + if (unlikely(irq >= scale)) { > + if (min) > + *min = scale; > + if (max) > + *max = scale; > + return scale; > + } > + > + /* The minimum utilization returns the highest level between: > + * - the computed DL bandwidth needed with the irq pressure which > + * steals time to the deadline task. > + * - The minimum bandwidth requirement for CFS. rq UCLAMP_MIN can also be driven by RT, not only CFS. > + */ > + if (min) > + *min = max(irq + cpu_bw_dl(rq), uclamp_rq_get(rq, UCLAMP_MIN)); > > /* > * Because the time spend on RT/DL tasks is visible as 'lost' time to > * CFS tasks and we use the same metric to track the effective > * utilization (PELT windows are synchronized) we can directly add them > * to obtain the CPU's actual utilization. > - * > - * CFS and RT utilization can be boosted or capped, depending on > - * utilization clamp constraints requested by currently RUNNABLE > - * tasks. > - * When there are no CFS RUNNABLE tasks, clamps are released and > - * frequency will be gracefully reduced with the utilization decay. > */ > util = util_cfs + cpu_util_rt(rq); > - if (type == FREQUENCY_UTIL) > - util = uclamp_rq_util_with(rq, util, p); > - > - dl_util = cpu_util_dl(rq); > - > - /* > - * For frequency selection we do not make cpu_util_dl() a permanent part > - * of this sum because we want to use cpu_bw_dl() later on, but we need > - * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such > - * that we select f_max when there is no idle time. > - * > - * NOTE: numerical errors or stop class might cause us to not quite hit > - * saturation when we should -- something for later. > - */ > - if (util + dl_util >= max) > - return max; > + util += cpu_util_dl(rq); > > - /* > - * OTOH, for energy computation we need the estimated running time, so > - * include util_dl and ignore dl_bw. > - */ > - if (type == ENERGY_UTIL) > - util += dl_util; > + if (util >= scale) { > + if (max) > + *max = scale; But that means that ucamp_max cannot constrain a system in which the 'util > ucamp_max'. I guess that's related to you saying uclamp_min is a hard req and uclamp_max is a soft req. I don't think that's in sync with the rest of the uclamp_max implantation. > + return scale; > + } > > /* > * There is still idle time; further improve the number by using the > @@ -7449,28 +7435,21 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > * U' = irq + --------- * U > * max > */ > - util = scale_irq_capacity(util, irq, max); > + util = scale_irq_capacity(util, irq, scale); > util += irq; > > - /* > - * Bandwidth required by DEADLINE must always be granted while, for > - * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism > - * to gracefully reduce the frequency when no tasks show up for longer > - * periods of time. > - * > - * Ideally we would like to set bw_dl as min/guaranteed freq and util + > - * bw_dl as requested freq. However, cpufreq is not yet ready for such > - * an interface. So, we only do the latter for now. > + /* The maximum hint is a soft bandwidth requirement which can be lower > + * than the actual utilization because of max uclamp requirments > */ > - if (type == FREQUENCY_UTIL) > - util += cpu_bw_dl(rq); > + if (max) > + *max = min(scale, uclamp_rq_get(rq, UCLAMP_MAX)); > > - return min(max, util); > + return min(scale, util); > } effective_cpu_util for FREQUENCY_UTIL (i.e. (*min != NULL && *max != NULL)) is slightly different. missing: if (!uclamp_is_used() && rt_rq_is_runnable(&rq->rt) return max probably moved into sugov_effective_cpu_perf() (which is only called for `FREQUENCY_UTIL`) ? old: irq_cap_scaling(util_cfs, util_rt) + irq + cpu_bw_dl() ^^^^^^^^^^^ new: irq_cap_scaling(util_cfs + util_rt + util_dl) + irq ^^^^^^^ [...] > +unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, > + unsigned long min, > + unsigned long max) > +{ > + unsigned long target; > + struct rq *rq = cpu_rq(cpu); > + > + if (rt_rq_is_runnable(&rq->rt)) > + return max; > + > + /* Provide at least enough capacity for DL + irq */ > + target = min; > + > + actual = map_util_perf(actual); > + /* Actually we don't need to target the max performance */ > + if (actual < max) > + max = actual; > + > + /* > + * Ensure at least minimum performance while providing more compute > + * capacity when possible. > + */ > + return max(target, max); Can you not just use: return max(min, max) and skip target? > +} > + > static void sugov_get_util(struct sugov_cpu *sg_cpu) > { > - unsigned long util = cpu_util_cfs_boost(sg_cpu->cpu); > - struct rq *rq = cpu_rq(sg_cpu->cpu); > + unsigned long min, max, util = cpu_util_cfs_boost(sg_cpu->cpu); > > - sg_cpu->bw_dl = cpu_bw_dl(rq); > - sg_cpu->util = effective_cpu_util(sg_cpu->cpu, util, > - FREQUENCY_UTIL, NULL); > + util = effective_cpu_util(sg_cpu->cpu, util, &min, &max); > + sg_cpu->bw_min = map_util_perf(min); > + sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max); > } > > /** > @@ -306,7 +329,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } > */ > static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu) > { > - if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl) > + if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_min) bw_min is more than DL right? bw_min = head_room(max(irq + cpu_bw_dl, rq's UCLAMP_MIN) [...]