Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3729119ybl; Mon, 27 Jan 2020 09:18:03 -0800 (PST) X-Google-Smtp-Source: APXvYqx/x+6P64kK34qUUso03d+Pyk4FrrSWw3b33S+eeYQjcezym+0hQDR6mx6v6Ae8SyfidR4D X-Received: by 2002:a9d:6183:: with SMTP id g3mr13067075otk.304.1580145483471; Mon, 27 Jan 2020 09:18:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580145483; cv=none; d=google.com; s=arc-20160816; b=Vja98B3NOMroN+P/MdTUxgc5xJeAgHr3Ep4fpbljH5SxZU6Yz+e4fBuUWjAMMGR+Fe u17/9a00ijYSEznTyKygbeK/srMxofnozwC9OIVLCpjOQoDEWQ512CtuX6Zj0xLE476M zZQWz2mZb+aqi8KyNP4dm6tPwBJ40g+DmVzSnnEzTcJqyOfl1srQALygPAiM3uWG2euQ JTDs+IgTtAyernm75zWyZu+zkJR04gC0Uv1zB2V7wrPZc7FLR4+4cVnvBu+VKZRpQPKY pPKN41jDkLJvCHAD8h1HqvKWHlfLI2uy5MH7is9LF5+lhrd8Xsl1TyldFQAYJy+j5ybi u57g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=+bYkFYyzlADrxfOOfruFV0DySun5/W9XTnyRwJfjfl8=; b=zr7LyfiqUdmtn98su6mUpXTJUw4xpXjM6Gfb8GnbfMQ2fmsaQQFuYf/Oiq/LEp0drK TzXQA0F0vO2iaZvcUNfjgAt4krCiwqVFs7gB1FrU0Hc1kEsYFrB8oIv0+zcnl/An96Iv 8yUttWsIwA7+SjSgGz3TLSeUpOBG+i27Umxx4iW4Ch4cUjRQK4TsKPrxdYSdDi7cHOVv Rg/TROhcY76wWrOlRwYSNxGPA/DpyOysgCMIiCVON7I9qKxc1ByUEj9u2fXObG09uhy6 B4fgcP8dBur1uKZFM94n2nznicNOb4Qi8F+7ZxpDj/N8XJahIXOzXHC1d30f/mLnjjGA qNeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Hb4Rm2T3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y22si7178915oti.269.2020.01.27.09.17.49; Mon, 27 Jan 2020 09:18:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Hb4Rm2T3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725975AbgA0RQc (ORCPT + 99 others); Mon, 27 Jan 2020 12:16:32 -0500 Received: from mail-lf1-f67.google.com ([209.85.167.67]:41504 "EHLO mail-lf1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725828AbgA0RQb (ORCPT ); Mon, 27 Jan 2020 12:16:31 -0500 Received: by mail-lf1-f67.google.com with SMTP id m30so6742448lfp.8 for ; Mon, 27 Jan 2020 09:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+bYkFYyzlADrxfOOfruFV0DySun5/W9XTnyRwJfjfl8=; b=Hb4Rm2T3zMuJm3b53/HOriq6QnhhM4oHwG+MhubxkYBndXSiHaxpqD3mo+IRm4trT4 XYk6UAypgr0YsWLDyS8qLN3aeh1fA7epeD4WgbVujBms7TsmryWy0IWNPw+oxid0iN+j dgd6t/LaIkL0XTTn2/cRx4u44jdloPponwQ9YZmbK2m6kki8JEXjvmZTzRK/e4LJx390 pIC7U/t8iALoRiWYIVQjc/1feajrW0rwUNwdOdvRZilTBOq9MaRa8nseMZjyYzCMJ0A+ Vv6bxq/W6kpxrZl+rAMq795OWBTvewcWUq/nkd+uq6H4AQ5zLjIsVOH+L3napUC36SQ5 rVDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+bYkFYyzlADrxfOOfruFV0DySun5/W9XTnyRwJfjfl8=; b=IQqiLsizxHas12b1FlB3tWKxI56FkLY2syY+y2D6E1VmG7Beh7Fcu/Cc0VfgueLCyz bqzKTg0cLrmFRldfLxv0kYV/72RzR0m62Us3kAbybYn9Q5US2bj5d+Xw4O1EKXOfr5Ae 4mFDlpooUaLP8KGLhqq5DPuDWNaD81tecdWDWVbjfybfC2HvySkmRwT/Ayow9B+R7cee X8akbhkkZseDKX15ivzNgvEgjwK1j+NlYFH7rzox2eijL8ACTQtZCam2Xtn4iX+tvu1x xmjRZuHcv5aXAvmnmz2MB4TZF+MRFvk84BxhK/fkQHJZXkzRIbJb8KrfMmRjTzy1QxSG fpIQ== X-Gm-Message-State: APjAAAVMrn23mCz6xMoFsi64ci+8/pAehVwy0MYpEJggduRuy8bJnhJv XDDZC38jEfdUp3c09OmVSVKz1RYm9gqO2jcaCzb/Iw== X-Received: by 2002:a19:5504:: with SMTP id n4mr8124769lfe.25.1580145388644; Mon, 27 Jan 2020 09:16:28 -0800 (PST) MIME-Version: 1.0 References: <20200122173538.1142069-1-douglas.raillard@arm.com> In-Reply-To: <20200122173538.1142069-1-douglas.raillard@arm.com> From: Vincent Guittot Date: Mon, 27 Jan 2020 18:16:17 +0100 Message-ID: Subject: Re: [RFC PATCH v4 0/6] sched/cpufreq: Make schedutil energy aware To: Douglas RAILLARD Cc: linux-kernel , "Rafael J. Wysocki" , viresh kumar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Quentin Perret , "open list:THERMAL" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 22 Jan 2020 at 18:36, Douglas RAILLARD wrote: > > Make schedutil cpufreq governor energy-aware. > > - patch 1 introduces a function to retrieve a frequency given a base > frequency and an energy cost margin. > - patch 2 links Energy Model perf_domain to sugov_policy. > - patch 3 updates get_next_freq() to make use of the Energy Model. > - patch 4 adds sugov_cpu_ramp_boost() function. > - patch 5 updates sugov_update_(single|shared)() to make use of > sugov_cpu_ramp_boost(). > - patch 6 introduces a tracepoint in get_next_freq() for > testing/debugging. Since it's not a trace event, it's not exposed to > userspace in a directly usable way, allowing for painless future > updates/removal. > > The benefits of using the EM in schedutil are twofold: > > 1) Selecting the highest possible frequency for a given cost. Some > platforms can have lower frequencies that are less efficient than > higher ones, in which case they should be skipped for most purposes. This make sense. Why using a lower frequency when a higher one is more power efficient > They can still be useful to give more freedom to thermal throttling > mechanisms, but not under normal circumstances. > note: the EM framework will warn about such OPPs "hertz/watts ratio > non-monotonically decreasing" > > 2) Driving the frequency selection with power in mind, in addition to > maximizing the utilization of the non-idle CPUs in the system. > > Point 1) is implemented in "PM: Introduce em_pd_get_higher_freq()" and > enabled in schedutil by > "sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq()". > > Point 2) is enabled in > "sched/cpufreq: Boost schedutil frequency ramp up". It allows using > higher frequencies when it is known that the true utilization of > currently running tasks is exceeding their previous stable point. > The benefits are: > > * Boosting the frequency when the behavior of a runnable task changes, > leading to an increase in utilization. That shortens the frequency > ramp up duration, which in turns allows the utilization signal to > reach stable values quicker. Since the allowed frequency boost is > bounded in energy, it will behave consistently across platforms, > regardless of the OPP cost range. Could you explain this a bit more ? > > * The boost is only transient, and should not impact a lot the energy > consumed of workloads with very stable utilization signals. > > This has been lightly tested with a rtapp task ramping from 10% to 75% > utilisation on a big core. Which kind of UC are you targeting ? Do you have some benchmark showing the benefit and how you can bound the increase of energy ? The benefit of point2 is less obvious for me. We already have uclamp which helps to overwrite the "utilization" that is seen by schedutil to boost or cap the frequency when some tasks are running. I'm curious to see what would be the benefit of this on top. > > v1 -> v2: > > * Split the new sugov_cpu_ramp_boost() from the existing > sugov_cpu_is_busy() as they seem to seek a different goal. > > * Implement sugov_cpu_ramp_boost() based on CFS util_avg and > util_est_enqueued signals, rather than using idle calls count. > This makes the ramp boost much more accurate in finding boost > opportunities, and give a "continuous" output rather than a boolean. > > * Add EM_COST_MARGIN_SCALE=1024 to represent the > margin values of em_pd_get_higher_freq(). > > v2 -> v3: > > * Check util_avg >= sg_cpu->util_avg in sugov_cpu_ramp_boost_update() > to avoid boosting when the utilization is decreasing. > > * Add a tracepoint for testing. > > v3 -> v4: > > * em_pd_get_higher_freq() now interprets the margin as absolute, > rather than relative to the cost of the base frequency. > > * Modify misleading comment in em_pd_get_higher_freq() since min_freq > can actually be higher than the max available frequency in normal > operations. > > Douglas RAILLARD (6): > PM: Introduce em_pd_get_higher_freq() > sched/cpufreq: Attach perf domain to sugov policy > sched/cpufreq: Hook em_pd_get_higher_power() into get_next_freq() > sched/cpufreq: Introduce sugov_cpu_ramp_boost > sched/cpufreq: Boost schedutil frequency ramp up > sched/cpufreq: Add schedutil_em_tp tracepoint > > include/linux/energy_model.h | 56 ++++++++++++++ > include/trace/events/power.h | 9 +++ > kernel/sched/cpufreq_schedutil.c | 124 +++++++++++++++++++++++++++++-- > 3 files changed, 182 insertions(+), 7 deletions(-) > > -- > 2.24.1 >