2020-11-30 18:42:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/2] cpufreq: Allow drivers to receive more information from the governor

Hi,

Using intel_pstate in the passive mode with HWP enabled, in particular under
the schedutil governor, is still kind of problematic, because it has to assume
that it should not allow the frequency to fall below the one requested by the
governor. For this reason, it translates the target frequency into HWP.REQ.MIN
which generally causes the processor to run a bit too fast.

Moreover, this allows the HWP algorithm to use any frequency between the target
one and HWP.REQ.MAX that corresponds to the policy max limit and some workloads
cause it to go for the max turbo frequency prematurely which hurts energy-
efficiency without improving performance, even though the schedutil governor
itself would not allow the frequency to ramp up so fast.

This patch series attempts to improve the situation by introducing a new driver
callback allowing the driver to receive more information from the governor. In
particular, this allows the min (required) and target (desired) performance
levels to be passed to it and those can be used to give better hints to the
hardware.

Patch [1/2] updates the core and the schedutil governor and patch [2/2] modifies
intel_pstate to use the new callback when appropriate.

Please refer to the patch changelogs for details.

Thanks!




2020-11-30 18:42:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 2/2] cpufreq: intel_pstate: Implement the ->adjust_perf() callback

From: Rafael J. Wysocki <[email protected]>

Make intel_pstate expose the ->adjust_perf() callback when it
operates in the passive mode with HWP enabled which causes the
schedutil governor to use that callback instead of ->fast_switch().

The minimum and target performance-level values passed by the
governor to ->adjust_perf() are converted to HWP.REQ.MIN and
HWP.REQ.DESIRED, respectively, which allows the processor to
adjust its configuration to maximize energy-efficiency while
providing sufficient capacity.

The "busy" argument of ->adjust_perf() is used to omit updates
that are likely to cause performance to drop below the expected
level.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 77 ++++++++++++++++++++++++++++++++++-------
1 file changed, 65 insertions(+), 12 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2526,20 +2526,19 @@ static void intel_cpufreq_trace(struct c
fp_toint(cpu->iowait_boost * 100));
}

-static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate,
- bool strict, bool fast_switch)
+static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 min, u32 max,
+ u32 desired, bool fast_switch)
{
u64 prev = READ_ONCE(cpu->hwp_req_cached), value = prev;

value &= ~HWP_MIN_PERF(~0L);
- value |= HWP_MIN_PERF(target_pstate);
+ value |= HWP_MIN_PERF(min);

- /*
- * The entire MSR needs to be updated in order to update the HWP min
- * field in it, so opportunistically update the max too if needed.
- */
value &= ~HWP_MAX_PERF(~0L);
- value |= HWP_MAX_PERF(strict ? target_pstate : cpu->max_perf_ratio);
+ value |= HWP_MAX_PERF(max);
+
+ value &= ~HWP_DESIRED_PERF(~0L);
+ value |= HWP_DESIRED_PERF(desired);

if (value == prev)
return;
@@ -2569,11 +2568,15 @@ static int intel_cpufreq_update_pstate(s
int old_pstate = cpu->pstate.current_pstate;

target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
- if (hwp_active)
- intel_cpufreq_adjust_hwp(cpu, target_pstate,
- policy->strict_target, fast_switch);
- else if (target_pstate != old_pstate)
+ if (hwp_active) {
+ int max_pstate = policy->strict_target ?
+ target_pstate : cpu->max_perf_ratio;
+
+ intel_cpufreq_adjust_hwp(cpu, target_pstate, max_pstate, 0,
+ fast_switch);
+ } else if (target_pstate != old_pstate) {
intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
+ }

cpu->pstate.current_pstate = target_pstate;

@@ -2634,6 +2637,54 @@ static unsigned int intel_cpufreq_fast_s
return target_pstate * cpu->pstate.scaling;
}

+static void intel_cpufreq_adjust_perf(unsigned int cpunum, bool busy,
+ unsigned long min_perf,
+ unsigned long target_perf,
+ unsigned long capacity)
+{
+ struct cpudata *cpu = all_cpu_data[cpunum];
+ int old_pstate = cpu->pstate.current_pstate;
+ int cap_pstate, min_pstate, max_pstate, target_pstate;
+
+ update_turbo_state();
+ cap_pstate = global.turbo_disabled ? cpu->pstate.max_pstate :
+ cpu->pstate.turbo_pstate;
+
+ /* Optimization: Avoid unnecessary divisions. */
+
+ target_pstate = cap_pstate;
+ if (target_perf < capacity) {
+ target_pstate = DIV_ROUND_UP(cap_pstate * target_perf, capacity);
+ /*
+ * Do not reduce the target P-state if the CPU has been busy
+ * recently, as the reduction is likely to be premature then.
+ */
+ if (target_pstate < old_pstate && busy)
+ target_pstate = old_pstate;
+ }
+
+ min_pstate = cap_pstate;
+ if (min_perf < capacity)
+ min_pstate = DIV_ROUND_UP(cap_pstate * min_perf, capacity);
+
+ if (min_pstate < cpu->pstate.min_pstate)
+ min_pstate = cpu->pstate.min_pstate;
+
+ if (min_pstate < cpu->min_perf_ratio)
+ min_pstate = cpu->min_perf_ratio;
+
+ max_pstate = min(cap_pstate, cpu->max_perf_ratio);
+ if (max_pstate < min_pstate)
+ max_pstate = min_pstate;
+
+ target_pstate = clamp_t(int, target_pstate, min_pstate, max_pstate);
+
+ intel_cpufreq_adjust_hwp(cpu, min_pstate, max_pstate, target_pstate, true);
+
+ cpu->pstate.current_pstate = target_pstate;
+ intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_FAST_SWITCH, old_pstate);
+}
+
static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
int max_state, turbo_max, min_freq, max_freq, ret;
@@ -3032,6 +3083,8 @@ static int __init intel_pstate_init(void
intel_pstate.attr = hwp_cpufreq_attrs;
intel_cpufreq.attr = hwp_cpufreq_attrs;
intel_cpufreq.flags |= CPUFREQ_NEED_UPDATE_LIMITS;
+ intel_cpufreq.fast_switch = NULL;
+ intel_cpufreq.adjust_perf = intel_cpufreq_adjust_perf;
if (!default_driver)
default_driver = &intel_pstate;




2020-11-30 18:42:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 1/2] cpufreq: Add special-purpose fast-switching callback for drivers

From: Rafael J. Wysocki <[email protected]>

First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework. In particular,
today the driver has to assume that it should allow the frequency to
fall below the one requested by the governor (or the required capacity
may not be provided) which may not be the case and which may lead to
excessive energy usage in some scenarios.

Second, the hints passed by these drivers to the hardware neeed not
be in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.

Address the two points above by adding a special-purpose replacement
for the ->fast_switch callback, called ->adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the information whether or not the given CPU has been
busy since the last update (which may allow the driver to skip the
update in some cases).

Also update the schedutil governor to use the new callback instead
of ->fast_switch if present.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/cpufreq.c | 41 +++++++++++++++++++++++++++++++++++++++
include/linux/cpufreq.h | 14 +++++++++++++
include/linux/sched/cpufreq.h | 5 ++++
kernel/sched/cpufreq_schedutil.c | 40 ++++++++++++++++++++++++++------------
4 files changed, 88 insertions(+), 12 deletions(-)

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -320,6 +320,15 @@ struct cpufreq_driver {
unsigned int index);
unsigned int (*fast_switch)(struct cpufreq_policy *policy,
unsigned int target_freq);
+ /*
+ * ->fast_switch() replacement for drivers that use an internal
+ * representation of performance levels and can pass hints other than
+ * the target performance level to the hardware.
+ */
+ void (*adjust_perf)(unsigned int cpu, bool busy,
+ unsigned long min_perf,
+ unsigned long target_perf,
+ unsigned long capacity);

/*
* Caches and returns the lowest driver-supported frequency greater than
@@ -588,6 +597,11 @@ struct cpufreq_governor {
/* Pass a target to the cpufreq driver */
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq);
+void cpufreq_driver_adjust_perf(unsigned int cpu, bool busy,
+ unsigned long min_perf,
+ unsigned long target_perf,
+ unsigned long capacity);
+bool cpufreq_driver_has_adjust_perf(void);
int cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation);
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -2094,6 +2094,47 @@ unsigned int cpufreq_driver_fast_switch(
}
EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);

+/**
+ * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
+ * @cpu: Target CPU.
+ * @busy: Whether or not @CPU has been busy since the previous update.
+ * @min_perf: Minimum (required) performance level (units of @capacity).
+ * @target_perf: Terget (desired) performance level (units of @capacity).
+ * @capacity: Capacity of the target CPU.
+ *
+ * Carry out a fast performance level switch of @cpu without sleeping.
+ *
+ * The driver's ->adjust_perf() callback invoked by this function must be
+ * suitable for being called from within RCU-sched read-side critical sections
+ * and it is expected to select a suitable performance level equal to or above
+ * @min_perf and preferably equal to or below @target_perf.
+ *
+ * This function must not be called if policy->fast_switch_enabled is unset.
+ *
+ * Governors calling this function must guarantee that it will never be invoked
+ * twice in parallel for the same CPU and that it will never be called in
+ * parallel with either ->target() or ->target_index() or ->fast_switch() for
+ * the same CPU.
+ */
+void cpufreq_driver_adjust_perf(unsigned int cpu, bool busy,
+ unsigned long min_perf,
+ unsigned long target_perf,
+ unsigned long capacity)
+{
+ cpufreq_driver->adjust_perf(cpu, busy, min_perf, target_perf, capacity);
+}
+
+/**
+ * cpufreq_driver_has_adjust_perf - Check "direct fast switch" callback.
+ *
+ * Return 'true' if the ->adjust_perf callback is present for the
+ * current driver or 'false' otherwise.
+ */
+bool cpufreq_driver_has_adjust_perf(void)
+{
+ return !!cpufreq_driver->adjust_perf;
+}
+
/* Must set freqs->new to intermediate frequency */
static int __target_intermediate(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, int index)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -40,6 +40,7 @@ struct sugov_policy {
struct task_struct *thread;
bool work_in_progress;

+ bool direct_fast_switch;
bool limits_changed;
bool need_freq_update;
};
@@ -454,6 +455,25 @@ static void sugov_update_single(struct u
util = sugov_get_util(sg_cpu);
max = sg_cpu->max;
util = sugov_iowait_apply(sg_cpu, time, util, max);
+
+ /*
+ * This code runs under rq->lock for the target CPU, so it won't run
+ * concurrently on two different CPUs for the same target and it is not
+ * necessary to acquire the lock in the fast switch case.
+ */
+ if (sg_policy->direct_fast_switch) {
+ /*
+ * In this case, any optimizations that can be done are up to
+ * the driver.
+ */
+ cpufreq_driver_adjust_perf(sg_cpu->cpu,
+ sugov_cpu_is_busy(sg_cpu),
+ map_util_perf(sg_cpu->bw_dl),
+ map_util_perf(util), max);
+ sg_policy->last_freq_update_time = time;
+ return;
+ }
+
next_f = get_next_freq(sg_policy, util, max);
/*
* Do not reduce the frequency if the CPU has not been idle
@@ -466,11 +486,6 @@ static void sugov_update_single(struct u
sg_policy->cached_raw_freq = cached_freq;
}

- /*
- * This code runs under rq->lock for the target CPU, so it won't run
- * concurrently on two different CPUs for the same target and it is not
- * necessary to acquire the lock in the fast switch case.
- */
if (sg_policy->policy->fast_switch_enabled) {
sugov_fast_switch(sg_policy, time, next_f);
} else {
@@ -655,10 +670,6 @@ static int sugov_kthread_create(struct s
struct cpufreq_policy *policy = sg_policy->policy;
int ret;

- /* kthread only required for slow path */
- if (policy->fast_switch_enabled)
- return 0;
-
kthread_init_work(&sg_policy->work, sugov_work);
kthread_init_worker(&sg_policy->worker);
thread = kthread_create(kthread_worker_fn, &sg_policy->worker,
@@ -736,9 +747,14 @@ static int sugov_init(struct cpufreq_pol
goto disable_fast_switch;
}

- ret = sugov_kthread_create(sg_policy);
- if (ret)
- goto free_sg_policy;
+ if (policy->fast_switch_enabled) {
+ sg_policy->direct_fast_switch = cpufreq_driver_has_adjust_perf();
+ } else {
+ /* kthread only required for slow path */
+ ret = sugov_kthread_create(sg_policy);
+ if (ret)
+ goto free_sg_policy;
+ }

mutex_lock(&global_tunables_lock);

Index: linux-pm/include/linux/sched/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/sched/cpufreq.h
+++ linux-pm/include/linux/sched/cpufreq.h
@@ -28,6 +28,11 @@ static inline unsigned long map_util_fre
{
return (freq + (freq >> 2)) * util / cap;
}
+
+static inline unsigned long map_util_perf(unsigned long util)
+{
+ return util + (util >> 2);
+}
#endif /* CONFIG_CPU_FREQ */

#endif /* _LINUX_SCHED_CPUFREQ_H */



2020-12-02 16:03:39

by Doug Smythies

[permalink] [raw]
Subject: RE: [RFC][PATCH 1/2] cpufreq: Add special-purpose fast-switching callback for drivers

Hi Rafael,

On 2020.11.30 10:37 Rafael J. Wysocki wrote:

> First off, some cpufreq drivers (eg. intel_pstate) can pass hints
> beyond the current target frequency to the hardware and there are no
> provisions for doing that in the cpufreq framework. In particular,
> today the driver has to assume that it should allow the frequency to

Forgot the important "not":

today the driver has to assume that it should allow not the frequency to

> fall below the one requested by the governor (or the required capacity
> may not be provided) which may not be the case and which may lead to
> excessive energy usage in some scenarios.
>
> Second, the hints passed by these drivers to the hardware neeed not

s/neeed/need

...

O.K. this is good.

The problem with my basic CPU frequency verses load test with the
schedutil governor is that it is always so oscillatory it is pretty
much not possible to conclude anything. So I re-worked the test
to look at Processor Package Power load instead.

In a previous e-mail [1] I had reported the power differences
for one periodic load at one frequency, as a (apparently cherry picked)
example. Quoted:

> schedutil governor:
> acpi-cpufreq: good
> intel_cpufreq hwp: bad <<<<< Now good, with this patch set.
> intel_cpufreq no hwp: good
> ...
> periodic workflow at 347 hertz.
> ~36% load at 4.60 GHz (where hwp operates)
> ~55% load at 3.2 GHz (where no hwp operates)
>
> intel_cpufreq hwp: 9.6 processor package watts. 45.8 watts on the mains to the computer.
> intel_cpufreq no hwp: ~6 processor package watts. ~41 watts on the mains to the computer. (noisy)

So this time, I only have power/energy data, and a relatively easy way to compress all 12,000
samples into some concise summary is to simply find the average power for the entire experiment:

Legend:
hwp: Kernel 5.10-rc6, HWP enabled; intel_cpufreq; schedutil (always)
rjw: Kernel 5.10-rc6 + this patch set, HWP enabled; intel_cpu-freq; schedutil
no-hwp: Kernel 5.10-rc6, HWP disabled; intel_cpu-freq; schedutil
acpi-cpufreq: Kernel 5.10-rc6, HWP disabled; acpi-cpufreq; schedutil

load work/sleep frequency: 73 Hertz:
hwp: Average: 12.00822 watts
rjw: Average: 10.18089 watts
no-hwp: Average: 10.21947 watts
acpi-cpufreq: Average: 9.06585 watts

load work/sleep frequency: 113 Hertz:

hwp: Average: 12.01056
rjw: Average: 10.12303
no-hwp: Average: 10.08228
acpi-cpufreq: Average: 9.02215

load work/sleep frequency: 211 Hertz:

hwp: Average: 12.16067
rjw: Average: 10.24413
no-hwp: Average: 10.12463
acpi-cpufreq: Average: 9.19175

load work/sleep frequency: 347 Hertz:

hwp: Average: 12.34169
rjw: Average: 10.79980
no-hwp: Average: 10.57296
acpi-cpufreq: Average: 9.84709

load work/sleep frequency: 401 Hertz:

hwp: Average: 12.42562
rjw: Average: 11.12465
no-hwp: Average: 11.24203
acpi-cpufreq: Average: 10.78670

[1] https://marc.info/?l=linux-pm&m=159769839401767&w=2

My tests results graphs:
Note: I have to code the web site, or I get hammered by bots.
Note: it is .com only because it was less expensive than .org
73 Hertz:
Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su73/
113 Hertz:
Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su113/
211 Hertz:
Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su211/
347 Hertz:
Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su347/
401 Hertz:
Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su401/

... Doug



2020-12-02 17:42:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] cpufreq: Add special-purpose fast-switching callback for drivers

On Wed, Dec 2, 2020 at 4:59 PM Doug Smythies <[email protected]> wrote:
>
> Hi Rafael,
>
> On 2020.11.30 10:37 Rafael J. Wysocki wrote:
>
> > First off, some cpufreq drivers (eg. intel_pstate) can pass hints
> > beyond the current target frequency to the hardware and there are no
> > provisions for doing that in the cpufreq framework. In particular,
> > today the driver has to assume that it should allow the frequency to
>
> Forgot the important "not":

Right, thanks for noticing that!

> today the driver has to assume that it should allow not the frequency to
>
> > fall below the one requested by the governor (or the required capacity
> > may not be provided) which may not be the case and which may lead to
> > excessive energy usage in some scenarios.
> >
> > Second, the hints passed by these drivers to the hardware neeed not
>
> s/neeed/need

Yup, thanks!

> ...
>
> O.K. this is good.
>
> The problem with my basic CPU frequency verses load test with the
> schedutil governor is that it is always so oscillatory it is pretty
> much not possible to conclude anything. So I re-worked the test
> to look at Processor Package Power load instead.
>
> In a previous e-mail [1] I had reported the power differences
> for one periodic load at one frequency, as a (apparently cherry picked)
> example. Quoted:
>
> > schedutil governor:
> > acpi-cpufreq: good
> > intel_cpufreq hwp: bad <<<<< Now good, with this patch set.

OK, great!

> > intel_cpufreq no hwp: good
> > ...
> > periodic workflow at 347 hertz.
> > ~36% load at 4.60 GHz (where hwp operates)
> > ~55% load at 3.2 GHz (where no hwp operates)
> >
> > intel_cpufreq hwp: 9.6 processor package watts. 45.8 watts on the mains to the computer.
> > intel_cpufreq no hwp: ~6 processor package watts. ~41 watts on the mains to the computer. (noisy)
>
> So this time, I only have power/energy data, and a relatively easy way to compress all 12,000
> samples into some concise summary is to simply find the average power for the entire experiment:
>
> Legend:
> hwp: Kernel 5.10-rc6, HWP enabled; intel_cpufreq; schedutil (always)
> rjw: Kernel 5.10-rc6 + this patch set, HWP enabled; intel_cpu-freq; schedutil
> no-hwp: Kernel 5.10-rc6, HWP disabled; intel_cpu-freq; schedutil
> acpi-cpufreq: Kernel 5.10-rc6, HWP disabled; acpi-cpufreq; schedutil
>
> load work/sleep frequency: 73 Hertz:
> hwp: Average: 12.00822 watts
> rjw: Average: 10.18089 watts
> no-hwp: Average: 10.21947 watts
> acpi-cpufreq: Average: 9.06585 watts
>
> load work/sleep frequency: 113 Hertz:
>
> hwp: Average: 12.01056
> rjw: Average: 10.12303
> no-hwp: Average: 10.08228
> acpi-cpufreq: Average: 9.02215
>
> load work/sleep frequency: 211 Hertz:
>
> hwp: Average: 12.16067
> rjw: Average: 10.24413
> no-hwp: Average: 10.12463
> acpi-cpufreq: Average: 9.19175
>
> load work/sleep frequency: 347 Hertz:
>
> hwp: Average: 12.34169
> rjw: Average: 10.79980
> no-hwp: Average: 10.57296
> acpi-cpufreq: Average: 9.84709
>
> load work/sleep frequency: 401 Hertz:
>
> hwp: Average: 12.42562
> rjw: Average: 11.12465
> no-hwp: Average: 11.24203
> acpi-cpufreq: Average: 10.78670
>
> [1] https://marc.info/?l=linux-pm&m=159769839401767&w=2
>
> My tests results graphs:
> Note: I have to code the web site, or I get hammered by bots.
> Note: it is .com only because it was less expensive than .org
> 73 Hertz:
> Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su73/
> 113 Hertz:
> Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su113/
> 211 Hertz:
> Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su211/
> 347 Hertz:
> Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su347/
> 401 Hertz:
> Double u double u double u dot smythies dot .com/~doug/linux/s18/hwp/k510-rc6/su401/

Thanks for the data, this is encouraging!

2020-12-03 12:46:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] cpufreq: Add special-purpose fast-switching callback for drivers

On Mon, Nov 30, 2020 at 07:37:01PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> First off, some cpufreq drivers (eg. intel_pstate) can pass hints
> beyond the current target frequency to the hardware and there are no

Everything CPPC, which is quite a bit these days.


> + /*
> + * ->fast_switch() replacement for drivers that use an internal
> + * representation of performance levels and can pass hints other than
> + * the target performance level to the hardware.
> + */
> + void (*adjust_perf)(unsigned int cpu, bool busy,
> + unsigned long min_perf,
> + unsigned long target_perf,
> + unsigned long capacity);
>

I'm not sure @busy makes sense, that's more a hack because @util had a
dip and should remain inside schedutil.


> @@ -454,6 +455,25 @@ static void sugov_update_single(struct u
> util = sugov_get_util(sg_cpu);
> max = sg_cpu->max;
> util = sugov_iowait_apply(sg_cpu, time, util, max);
> +
> + /*
> + * This code runs under rq->lock for the target CPU, so it won't run
> + * concurrently on two different CPUs for the same target and it is not
> + * necessary to acquire the lock in the fast switch case.
> + */
> + if (sg_policy->direct_fast_switch) {
> + /*
> + * In this case, any optimizations that can be done are up to
> + * the driver.
> + */
> + cpufreq_driver_adjust_perf(sg_cpu->cpu,
> + sugov_cpu_is_busy(sg_cpu),
> + map_util_perf(sg_cpu->bw_dl),
> + map_util_perf(util), max);
> + sg_policy->last_freq_update_time = time;
> + return;
> + }

Instead of adding more branches, would it makes sense to simply set a
whole different util_hook in this case?

2020-12-03 14:47:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] cpufreq: Add special-purpose fast-switching callback for drivers

On Thu, Dec 3, 2020 at 1:42 PM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Nov 30, 2020 at 07:37:01PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > First off, some cpufreq drivers (eg. intel_pstate) can pass hints
> > beyond the current target frequency to the hardware and there are no
>
> Everything CPPC, which is quite a bit these days.

Right, but that is still "some". :-) I can add it to the list of
examples, though.

> > + /*
> > + * ->fast_switch() replacement for drivers that use an internal
> > + * representation of performance levels and can pass hints other than
> > + * the target performance level to the hardware.
> > + */
> > + void (*adjust_perf)(unsigned int cpu, bool busy,
> > + unsigned long min_perf,
> > + unsigned long target_perf,
> > + unsigned long capacity);
> >
>
> I'm not sure @busy makes sense, that's more a hack because @util had a
> dip and should remain inside schedutil.

So I did it this way, because schedutil would need to store the old
value of target_perf for this and intel_pstate already does that.

But if a new util_hook is used in this case, the existing space in
sg_policy may be used for that.

> > @@ -454,6 +455,25 @@ static void sugov_update_single(struct u
> > util = sugov_get_util(sg_cpu);
> > max = sg_cpu->max;
> > util = sugov_iowait_apply(sg_cpu, time, util, max);
> > +
> > + /*
> > + * This code runs under rq->lock for the target CPU, so it won't run
> > + * concurrently on two different CPUs for the same target and it is not
> > + * necessary to acquire the lock in the fast switch case.
> > + */
> > + if (sg_policy->direct_fast_switch) {
> > + /*
> > + * In this case, any optimizations that can be done are up to
> > + * the driver.
> > + */
> > + cpufreq_driver_adjust_perf(sg_cpu->cpu,
> > + sugov_cpu_is_busy(sg_cpu),
> > + map_util_perf(sg_cpu->bw_dl),
> > + map_util_perf(util), max);
> > + sg_policy->last_freq_update_time = time;
> > + return;
> > + }
>
> Instead of adding more branches, would it makes sense to simply set a
> whole different util_hook in this case?

Looks doable without too much code duplication. Lemme try.

2020-12-07 07:49:51

by Viresh Kumar

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] cpufreq: Add special-purpose fast-switching callback for drivers

On 30-11-20, 19:37, Rafael J. Wysocki wrote:
> Index: linux-pm/include/linux/cpufreq.h
> ===================================================================
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -320,6 +320,15 @@ struct cpufreq_driver {
> unsigned int index);
> unsigned int (*fast_switch)(struct cpufreq_policy *policy,
> unsigned int target_freq);
> + /*
> + * ->fast_switch() replacement for drivers that use an internal
> + * representation of performance levels and can pass hints other than
> + * the target performance level to the hardware.
> + */
> + void (*adjust_perf)(unsigned int cpu, bool busy,

Maybe this should still take policy as an argument (like other calls)
instead of CPU, even if it is going to be used for single-cpu per
policy case for now.

> + unsigned long min_perf,
> + unsigned long target_perf,
> + unsigned long capacity);

--
viresh

2020-12-07 13:46:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2] cpufreq: Add special-purpose fast-switching callback for drivers

On Mon, Dec 7, 2020 at 8:47 AM Viresh Kumar <[email protected]> wrote:
>
> On 30-11-20, 19:37, Rafael J. Wysocki wrote:
> > Index: linux-pm/include/linux/cpufreq.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/cpufreq.h
> > +++ linux-pm/include/linux/cpufreq.h
> > @@ -320,6 +320,15 @@ struct cpufreq_driver {
> > unsigned int index);
> > unsigned int (*fast_switch)(struct cpufreq_policy *policy,
> > unsigned int target_freq);
> > + /*
> > + * ->fast_switch() replacement for drivers that use an internal
> > + * representation of performance levels and can pass hints other than
> > + * the target performance level to the hardware.
> > + */
> > + void (*adjust_perf)(unsigned int cpu, bool busy,
>
> Maybe this should still take policy as an argument (like other calls)
> instead of CPU, even if it is going to be used for single-cpu per
> policy case for now.

That can be changed in the future if need be.

Otherwise this path doesn't need to look at the policy object at all
and I'd rather keep it this way.

>
> > + unsigned long min_perf,
> > + unsigned long target_perf,
> > + unsigned long capacity);
>
> --