Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp737094rwb; Thu, 8 Dec 2022 02:13:14 -0800 (PST) X-Google-Smtp-Source: AA0mqf5Iw7+JZAOI5qcm7/0/IwYj0JWMS4P/O7QlQ+zdjq1dfPRvsiUabzHUNUI0N9ll26NdENFG X-Received: by 2002:a63:1659:0:b0:46e:f23a:e9aa with SMTP id 25-20020a631659000000b0046ef23ae9aamr68496239pgw.428.1670494394603; Thu, 08 Dec 2022 02:13:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670494394; cv=none; d=google.com; s=arc-20160816; b=Rop38Gy+XgSjwQkJ7tjiZJDm8L8RxsGKq5uqVoOMnVnFsAD0XxeLMBWCvMpL/DFz5w YkoIpV7DcrspybCSeiVpxcN5m7RLwJR8j8PMXVgvd40SktjkFF+pSRCBcZjR25WVcRBv MTQCQiqDm1DG8z+R9cZDHgZfI0AhcSUrC4sQuruD+3IzFZQCNYz/QccK5qOK+m5ksZ1j S7Qq45WKZQ5PC8V0IiZPQaFlhfLzISUoJudJjFYP7OFXWF+HK7lVq1wGCLNG4puUON03 Z8cvY4/pa+w/dSZr6ft26YP+T4FnxEaT6Ng9hpfvlcBgANStDln/CtwFrueu0rOZwcxk /rIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=AKLfFHMw7UNq3A8vr+VztGIJacd9vPtbxKm/5d+cl7w=; b=A01yIDNajlY6EMd1WFr1npUE6YOqf6b0ww77LR2eXrqsey+frfeS3aH+OwnXPFD4+A ZontxJB9bdKNA5UU0c5Xz2NwCEnqybyjbCSTI1+m0nGY21XlkcCOewKRUeobzPy7jkL3 +0VaStrBRGa3cYyPMjeCXknxat/RLwewHySkS8VKzgTFgTbfHiXGvzd9+BREYnDakRvY EU6NKlp7OYSeY3+m9v/rpiptPxRRptsPC/9FqxBX/9hm0/mE+XmhIBVF42xGyfcVWIcj WKaFOS9gxg8e9+Yun8kpDqFqE+dQi1XjwNRCzEatYcfMFqf7X8VFN04jWZKET+c8uThc 3lKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i3-20020a170902cf0300b00189aee21a03si13852659plg.423.2022.12.08.02.13.04; Thu, 08 Dec 2022 02:13:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230042AbiLHKGf (ORCPT + 74 others); Thu, 8 Dec 2022 05:06:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229940AbiLHKGb (ORCPT ); Thu, 8 Dec 2022 05:06:31 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4EEE653EC0; Thu, 8 Dec 2022 02:06:29 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 23ED723A; Thu, 8 Dec 2022 02:06:36 -0800 (PST) Received: from [10.57.9.192] (unknown [10.57.9.192]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 971E73F73B; Thu, 8 Dec 2022 02:06:26 -0800 (PST) Message-ID: <7428f6dd-9403-180b-d4b4-7ef1aee3dcb1@arm.com> Date: Thu, 8 Dec 2022 10:06:24 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: [PATCH v2 2/2] cpufreq: schedutil: Optimize operations with single max CPU capacity Content-Language: en-US To: Vincent Guittot Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org, dietmar.eggemann@arm.com, saravanak@google.com, wusamuel@google.com, isaacmanjarres@google.com, kernel-team@android.com, juri.lelli@redhat.com, peterz@infradead.org, mingo@redhat.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, viresh.kumar@linaro.org References: <20221207101705.9460-1-lukasz.luba@arm.com> <20221207101705.9460-3-lukasz.luba@arm.com> From: Lukasz Luba In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/8/22 08:37, Vincent Guittot wrote: > On Wed, 7 Dec 2022 at 11:17, Lukasz Luba wrote: >> >> The max CPU capacity is the same for all CPUs sharing frequency domain >> and thus 'policy' object. There is a way to avoid heavy operations >> in a loop for each CPU by leveraging this knowledge. Thus, simplify >> the looping code in the sugov_next_freq_shared() and drop heavy >> multiplications. Instead, use simple max() to get the highest utilization >> from these CPUs. This is useful for platforms with many (4 or 6) little >> CPUs. >> >> The max CPU capacity must be fetched every time we are called, due to >> difficulties during the policy setup, where we are not able to get the >> normalized CPU capacity at the right time. >> >> The stored value in sugov_policy::max is also than used in >> sugov_iowait_apply() to calculate the right boost. Thus, that field is >> useful to have in that sugov_policy struct. >> >> Signed-off-by: Lukasz Luba >> --- >> kernel/sched/cpufreq_schedutil.c | 22 +++++++++++----------- >> 1 file changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c >> index c19d6de67b7a..f9881f3d9488 100644 >> --- a/kernel/sched/cpufreq_schedutil.c >> +++ b/kernel/sched/cpufreq_schedutil.c >> @@ -158,10 +158,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, >> >> static void sugov_get_util(struct sugov_cpu *sg_cpu) >> { >> - struct sugov_policy *sg_policy = sg_cpu->sg_policy; >> struct rq *rq = cpu_rq(sg_cpu->cpu); >> >> - sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu); >> sg_cpu->bw_dl = cpu_bw_dl(rq); >> sg_cpu->util = effective_cpu_util(sg_cpu->cpu, cpu_util_cfs(sg_cpu->cpu), >> FREQUENCY_UTIL, NULL); >> @@ -317,6 +315,8 @@ static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu) >> static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu, >> u64 time, unsigned int flags) >> { >> + struct sugov_policy *sg_policy = sg_cpu->sg_policy; >> + >> sugov_iowait_boost(sg_cpu, time, flags); >> sg_cpu->last_update = time; >> >> @@ -325,6 +325,9 @@ static inline bool sugov_update_single_common(struct sugov_cpu *sg_cpu, >> if (!sugov_should_update_freq(sg_cpu->sg_policy, time)) >> return false; >> >> + /* Fetch the latest CPU capcity to avoid stale data */ >> + sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu); >> + >> sugov_get_util(sg_cpu); >> sugov_iowait_apply(sg_cpu, time); >> >> @@ -414,25 +417,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) >> { >> struct sugov_policy *sg_policy = sg_cpu->sg_policy; >> struct cpufreq_policy *policy = sg_policy->policy; >> - unsigned long util = 0, max = 1; >> + unsigned long util = 0; >> unsigned int j; >> >> + /* Fetch the latest CPU capcity to avoid stale data */ >> + sg_policy->max = arch_scale_cpu_capacity(sg_cpu->cpu); >> + >> for_each_cpu(j, policy->cpus) { >> struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j); >> - unsigned long j_util, j_max; >> >> sugov_get_util(j_sg_cpu); >> sugov_iowait_apply(j_sg_cpu, time); >> - j_util = j_sg_cpu->util; >> - j_max = j_sg_cpu->max; >> >> - if (j_util * max > j_max * util) { >> - util = j_util; >> - max = j_max; >> - } > > With the code removed above, max is only used in 2 places: > - sugov_iowait_apply > - map_util_freq > > I wonder if it would be better to just call arch_scale_cpu_capacity() > in these 2 places instead of saving a copy in sg_policy and then > reading it twice. The sugov_iowait_apply() is called in that loop, so probably I will add a new argument to that call and just feed it with the capacity value from one CPU, which was read before the loop. So, similarly what is in this patch. Otherwise, all of those per-cpu capacity vars would be accessed inside the sugov_iowait_apply() with sg_cpu->cpu. > > arch_scaleu_cpu_capacity is already a per_cpu variable so accessing it > should be pretty cheap. Yes and no, as you said this is per-cpu variable and would access them from one CPU, which is running that loop. They will have different pages and addresses so cache lines on that CPU. to avoiding trashing a cache lines on this running CPU let's read that capacity once, before the loop. Let's use the new arg to pass that value via one of the registers. In such, only one cache line would have to fetch that data into. So I thought this simple sg_policy->max would do the trick w/o a lot of hassle. > > Thought ? > I can change that and drop the sg_policy->max and call differently those capacity values. I will have to unfortunately drop Viresh's ACKs, since this will be a way different code. Thanks Vincent for the suggestion. Do you want me to go further with such approach and send a v3?