Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp864289lqo; Fri, 17 May 2024 04:08:30 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUt0zDl6gWG3cg3Q/AWNJmdo6WSAEWsJ0WpcgMviSUL8i5qFL5OG7ezdu7eeAkNWduxdnjJoFv9xBW+oHjkpG1XjBVzXUMaw4pxJeapmw== X-Google-Smtp-Source: AGHT+IHSpI+R18VAvyeGA/m+yQUTCmOSQQTv5+lii9/CE/l0+aDtUwWiJMWQBHz8rE3ZPWKSb7c5 X-Received: by 2002:a05:6e02:1a84:b0:36d:bd80:a74b with SMTP id e9e14a558f8ab-36dbd80aa02mr54255145ab.21.1715944110393; Fri, 17 May 2024 04:08:30 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715944110; cv=pass; d=google.com; s=arc-20160816; b=yWpfzmpI6KF3YEQNB461k4rc1j/JmpveA6vK9rJKSVURqrmpGjgzDRpUrxGdhvKPaL 4IqqddIIxRM+SQzrd4CCFO8dMXRFdGdXd29f2g9Znvi97673WkknWFYaFp2nYcfT3R7l kKFxC1PzHe13me8gYsZTrjpImr9+Xb7x7nUB7pH0LF0cA75XAC4QWy7YqZTXZVdwagCR 0kJzHF9Hyt+c5MCMdq2SH7g+azYWdUF0b7c6RNAAg5rqXGWHnNzwR9FkQiFxeJ3MlcwE HaHwlMkLsjfyaScoAX8S4VyJhbr5+h9/al4Hm9PdQCVkEsOioAvQrq9MMZWj+pyCllBt GdQQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=UzeqqPSVC2+9+jP3JHpUSsgL0WgNmLiO1uwopRwKhdI=; fh=+mnpYJeYFLDVMEW3WqhViLTBFR1ZgSLtG0YNEXX0NLk=; b=EJ/B2Kxj3t6y30kHUFt+QKC8+8pfLqRMoiwzRROKzbdw+ckLVqnINZP3WiI32G5v5k 1cDvsWSaw7h9siPWMWjIuncEgXGurLjpMbHrohrU2AjDiVZ9Ee3EmDIaQho8REi7mmwD TF8o343t9Vbi9MX1dNGeuyekgC1gJd+2KZ519lPbpmVhZIAzsxlSiNBFknuG/rJO6svT fZFYxXT5dWLw1jl5o1t6qrNOg5LOPW7rZAlG1dpnKfzM6ZdnA6b2lKNJbdXiWE+hsf/W BhtWIiCERz6Txh0w/dtVOQIKH6NGpmwQVLMw0J2sgk1XCFAJ+3uM+kvTUlqo+RlC91dZ JIYg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-182023-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182023-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 41be03b00d2f7-658d3b63c1asi3638283a12.675.2024.05.17.04.08.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 04:08:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-182023-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-182023-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182023-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 371A6B22E17 for ; Fri, 17 May 2024 11:07:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C82343BBF0; Fri, 17 May 2024 11:06:54 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C82AA3AC2F; Fri, 17 May 2024 11:06:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715944014; cv=none; b=o9h7UuFLwclua4HHETKcjNOpFZtRZ5j8eRoZqRSf+1jmdLN67y9uVyte4PnYZN7px7dnbdO7T98BHFMue6bwSBnKrvkug9lhRjNWec3xpQpSXFD1iIZd/Evq/bJuf+oShecvqycBcNnszPcsE1VNS6z9wqEV2S4HFhRc4fWteqY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715944014; c=relaxed/simple; bh=Iflwst1YHxCba42648JKAgwhZA6cooCNF0L2d6gxE9Q=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=kI4UlyDuwfgw87NPZisBT9QXmk3h8GEbybXPfSY/0Iox+rO+MY/C62HgYiLiw36LMcDhqjEpNwDHy1R4rR5ZHMtER/Lz8f2P8QP11/kiAvRvWTLv+WXkfeoxvW31JVcqQy0W1kLtZ5uq7bWYXH7WwpgswWgKTIW9aYEfzP89U10= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DBA791424; Fri, 17 May 2024 04:07:13 -0700 (PDT) Received: from [10.57.3.11] (unknown [10.57.3.11]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6DA2F3F7A6; Fri, 17 May 2024 04:06:47 -0700 (PDT) Message-ID: Date: Fri, 17 May 2024 12:06:45 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4] sched: Consolidate cpufreq updates To: Qais Yousef , "Rafael J. Wysocki" , Viresh Kumar , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Juri Lelli Cc: Steven Rostedt , Dietmar Eggemann , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Christian Loehle , linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org References: <20240516204802.846520-1-qyousef@layalina.io> Content-Language: en-US From: Hongyan Xia In-Reply-To: <20240516204802.846520-1-qyousef@layalina.io> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 16/05/2024 21:48, Qais Yousef wrote: > Improve the interaction with cpufreq governors by making the > cpufreq_update_util() calls more intentional. > > At the moment we send them when load is updated for CFS, bandwidth for > DL and at enqueue/dequeue for RT. But this can lead to too many updates > sent in a short period of time and potentially be ignored at a critical > moment due to the rate_limit_us in schedutil. > > For example, simultaneous task enqueue on the CPU where 2nd task is > bigger and requires higher freq. The trigger to cpufreq_update_util() by > the first task will lead to dropping the 2nd request until tick. Or > another CPU in the same policy triggers a freq update shortly after. > > Updates at enqueue for RT are not strictly required. Though they do help > to reduce the delay for switching the frequency and the potential > observation of lower frequency during this delay. But current logic > doesn't intentionally (at least to my understanding) try to speed up the > request. > > To help reduce the amount of cpufreq updates and make them more > purposeful, consolidate them into these locations: > > 1. context_switch() > 2. task_tick_fair() > 3. update_blocked_averages() > 4. on syscall that changes policy or uclamp values > > The update at context switch should help guarantee that DL and RT get > the right frequency straightaway when they're RUNNING. As mentioned > though the update will happen slightly after enqueue_task(); though in > an ideal world these tasks should be RUNNING ASAP and this additional > delay should be negligible. For fair tasks we need to make sure we send > a single update for every decay for the root cfs_rq. Any changes to the > rq will be deferred until the next task is ready to run, or we hit TICK. > But we are guaranteed the task is running at a level that meets its > requirements after enqueue. > > To guarantee RT and DL tasks updates are never missed, we add a new > SCHED_CPUFREQ_FORCE_UPDATE to ignore the rate_limit_us. If we are > already running at the right freq, the governor will end up doing > nothing, but we eliminate the risk of the task ending up accidentally > running at the wrong freq due to rate_limit_us. > > Similarly for iowait boost, we ignore rate limits. We also handle a case > of a boost reset prematurely by adding a guard in sugov_iowait_apply() > to reduce the boost after 1ms which seems iowait boost mechanism relied > on rate_limit_us and cfs_rq.decay preventing any updates to happen soon > after iowait boost. > > The new SCHED_CPUFREQ_FORCE_UPDATE should not impact the rate limit > time stamps otherwise we can end up delaying updates for normal > requests. > > As a simple optimization, we avoid sending cpufreq updates when > switching from RT to another RT as RT tasks run at max freq by default. > If CONFIG_UCLAMP_TASK is enabled, we can do a simple check to see if > uclamp_min is different to avoid unnecessary cpufreq update as most RT > tasks are likely to be running at the same performance level, so we can > avoid unnecessary overhead of forced updates when there's nothing to do. > > We also ensure to ignore cpufreq udpates for sugov workers at context > switch. It doesn't make sense for the kworker that applies the frequency > update (which is a DL task) to trigger a frequency update itself. > > The update at task_tick_fair will guarantee that the governor will > follow any updates to load for tasks/CPU or due to new enqueues/dequeues > to the rq. Since DL and RT always run at constant frequencies and have > no load tracking, this is only required for fair tasks. > > The update at update_blocked_averages() will ensure we decay frequency > as the CPU becomes idle for long enough. > > If the currently running task changes its policy or uclamp values, we > ensure we follow up with cpufreq update to ensure we follow up with any > potential new perf requirements based on the new change. > > [...] > > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h > index bdd31ab93bc5..2d0a45aba16f 100644 > --- a/include/linux/sched/cpufreq.h > +++ b/include/linux/sched/cpufreq.h > @@ -8,7 +8,8 @@ > * Interface between cpufreq drivers and the scheduler: > */ > > -#define SCHED_CPUFREQ_IOWAIT (1U << 0) > +#define SCHED_CPUFREQ_IOWAIT (1U << 0) > +#define SCHED_CPUFREQ_FORCE_UPDATE (1U << 1) /* ignore transition_delay_us */ > > #ifdef CONFIG_CPU_FREQ > struct cpufreq_policy; > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 1a914388144a..d0c97a66627a 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -152,6 +152,9 @@ const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK; > > __read_mostly int scheduler_running; > > +static __always_inline void > +update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev); > + > #ifdef CONFIG_SCHED_CORE > > DEFINE_STATIC_KEY_FALSE(__sched_core_enabled); > @@ -1958,7 +1961,7 @@ static bool uclamp_reset(const struct sched_attr *attr, > return false; > } > > -static void __setscheduler_uclamp(struct task_struct *p, > +static void __setscheduler_uclamp(struct rq *rq, struct task_struct *p, > const struct sched_attr *attr) > { > enum uclamp_id clamp_id; > @@ -1980,7 +1983,6 @@ static void __setscheduler_uclamp(struct task_struct *p, > value = uclamp_none(clamp_id); > > uclamp_se_set(uc_se, value, false); > - > } > > if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP))) > @@ -1997,6 +1999,13 @@ static void __setscheduler_uclamp(struct task_struct *p, > uclamp_se_set(&p->uclamp_req[UCLAMP_MAX], > attr->sched_util_max, true); > } > + > + /* > + * Updating uclamp values has impact on freq, ensure it is taken into > + * account. > + */ > + if (task_current(rq, p)) > + update_cpufreq_ctx_switch(rq, NULL); Do we care about updating the frequency here? p is dequeued during the __setscheduler_uclamp() call, so I think it's better to do this after the uclamp() call and after enqueue_task(), so that uclamp_rq_inc() comes into effect. Also, do we want to limit the update to task_current()? Updating a uclamp_min of a task on this rq (even though it is not current) should raise the minimum OPP for the whole rq. An example is that if a uclamp_min task gets enqueued, the uclamp_min should kick in even if this task isn't immediately run and the current task isn't this task. > } > > static void uclamp_fork(struct task_struct *p) > [...]