Hi,
With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into schedutil are only made from the scheduler if the target CPU of the
event is the same as the current CPU. This means there are certain
situations where a target CPU may not run schedutil for some time.
One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that new tasks should receive maximum demand
initially, this should result in CPU0 increasing frequency immediately.
Because of the above mentioned limitation though this does not occur.
This is verified using ftrace with the sample [1] application.
Maybe the ideal solution is to always allow remote callbacks but that
has its own challenges:
o There is no protection required for single CPU per policy case today,
and adding any kind of locking there, to supply remote callbacks,
isn't really a good idea.
o If is local CPU isn't part of the same cpufreq policy as the target
CPU, then we wouldn't be able to do fast switching at all and have to
use some kind of bottom half to schedule work on the target CPU to do
real switching. That may be overkill as well.
And so this series only allows remote callbacks for target CPUs that
share the cpufreq policy with the local CPU.
This series is tested with couple of usecases (Android: hackbench,
recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey
board (64 bit octa-core, single policy). Only galleryfling showed minor
improvements, while others didn't had much deviation.
The reason being that this patchset only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:
- Task is migrated to another CPU.
- The task has maximum demand initially, and should take the CPU to
higher OPPs.
- And the target CPU doesn't call into schedutil until the next tick.
V2->V3:
- Rearranged/merged patches as suggested by Rafael (looks much better
now)
- Also handle new hook added to intel-pstate driver.
- The final code remains the same as V2, except for the above hook.
V1->V2:
- Don't support remote callbacks for unshared cpufreq policies.
- Don't support remote callbacks where local CPU isn't part of the
target CPU's cpufreq policy.
- Dropped dvfs_possible_from_any_cpu flag.
--
viresh
[1] http://pastebin.com/7LkMSRxE
Viresh Kumar (3):
sched: cpufreq: Allow remote cpufreq callbacks
cpufreq: schedutil: Process remote callback for shared policies
cpufreq: governor: Process remote callback for shared policies
drivers/cpufreq/cpufreq_governor.c | 4 ++++
drivers/cpufreq/intel_pstate.c | 8 ++++++++
include/linux/sched/cpufreq.h | 1 +
kernel/sched/cpufreq.c | 1 +
kernel/sched/cpufreq_schedutil.c | 19 ++++++++++++++-----
kernel/sched/deadline.c | 2 +-
kernel/sched/fair.c | 8 +++++---
kernel/sched/rt.c | 2 +-
kernel/sched/sched.h | 10 ++--------
9 files changed, 37 insertions(+), 18 deletions(-)
--
2.13.0.71.gd7076ec9c9cb
We do not call cpufreq callbacks from scheduler core for remote
(non-local) CPUs currently. But there are cases where such remote
callbacks are useful, specially in the case of shared cpufreq policies.
This patch updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well.
For now, all the registered utilization update callbacks are updated to
return early if remote callback is detected. That is, this patch just
moves the decision making down in the hierarchy.
Later patches would enable remote callbacks for shared policies.
Based on initial work from Steve Muckle.
Signed-off-by: Steve Muckle <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq_governor.c | 4 ++++
drivers/cpufreq/intel_pstate.c | 8 ++++++++
include/linux/sched/cpufreq.h | 1 +
kernel/sched/cpufreq.c | 1 +
kernel/sched/cpufreq_schedutil.c | 8 ++++++++
kernel/sched/deadline.c | 2 +-
kernel/sched/fair.c | 8 +++++---
kernel/sched/rt.c | 2 +-
kernel/sched/sched.h | 10 ++--------
9 files changed, 31 insertions(+), 13 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 47e24b5384b3..606b1a37a1af 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -275,6 +275,10 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,
struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
u64 delta_ns, lst;
+ /* Don't allow remote callbacks */
+ if (smp_processor_id() != data->cpu)
+ return;
+
/*
* The work may not be allowed to be queued up right now.
* Possible reasons:
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b7fb8b7c980d..4bee2f4cbc28 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1732,6 +1732,10 @@ static void intel_pstate_update_util_pid(struct update_util_data *data,
struct cpudata *cpu = container_of(data, struct cpudata, update_util);
u64 delta_ns = time - cpu->sample.time;
+ /* Don't allow remote callbacks */
+ if (smp_processor_id() != data->cpu)
+ return;
+
if ((s64)delta_ns < pid_params.sample_rate_ns)
return;
@@ -1749,6 +1753,10 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time,
struct cpudata *cpu = container_of(data, struct cpudata, update_util);
u64 delta_ns;
+ /* Don't allow remote callbacks */
+ if (smp_processor_id() != data->cpu)
+ return;
+
if (flags & SCHED_CPUFREQ_IOWAIT) {
cpu->iowait_boost = int_tofp(1);
} else if (cpu->iowait_boost) {
diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d2be2ccbb372..8256a8f35f22 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -16,6 +16,7 @@
#ifdef CONFIG_CPU_FREQ
struct update_util_data {
void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
+ unsigned int cpu;
};
void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index dbc51442ecbc..ee4c596b71b4 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -42,6 +42,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
return;
data->func = func;
+ data->cpu = cpu;
rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
}
EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 29a397067ffa..ed9c589e5386 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
unsigned int next_f;
bool busy;
+ /* Remote callbacks aren't allowed for policies which aren't shared */
+ if (smp_processor_id() != hook->cpu)
+ return;
+
sugov_set_iowait_boost(sg_cpu, time, flags);
sg_cpu->last_update = time;
@@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
unsigned long util, max;
unsigned int next_f;
+ /* Don't allow remote callbacks */
+ if (smp_processor_id() != hook->cpu)
+ return;
+
sugov_get_util(&util, &max);
raw_spin_lock(&sg_policy->update_lock);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a84299f44b5d..7fcfaee39d19 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1136,7 +1136,7 @@ static void update_curr_dl(struct rq *rq)
}
/* kick cpufreq (see the comment in kernel/sched/sched.h). */
- cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_DL);
+ cpufreq_update_util(rq, SCHED_CPUFREQ_DL);
schedstat_set(curr->se.statistics.exec_max,
max(curr->se.statistics.exec_max, delta_exec));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e216f6..d378d02fdfcb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3278,7 +3278,9 @@ static inline void set_tg_cfs_propagate(struct cfs_rq *cfs_rq) {}
static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
{
- if (&this_rq()->cfs == cfs_rq) {
+ struct rq *rq = rq_of(cfs_rq);
+
+ if (&rq->cfs == cfs_rq) {
/*
* There are a few boundary cases this might miss but it should
* get called often enough that that should (hopefully) not be
@@ -3295,7 +3297,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
*
* See cpu_util().
*/
- cpufreq_update_util(rq_of(cfs_rq), 0);
+ cpufreq_update_util(rq, 0);
}
}
@@ -4875,7 +4877,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
* passed.
*/
if (p->in_iowait)
- cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_IOWAIT);
+ cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
for_each_sched_entity(se) {
if (se->on_rq)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 45caf937ef90..0af5ca9e3e3f 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -970,7 +970,7 @@ static void update_curr_rt(struct rq *rq)
return;
/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
- cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_RT);
+ cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
schedstat_set(curr->se.statistics.exec_max,
max(curr->se.statistics.exec_max, delta_exec));
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index eeef1a3086d1..aa9d5b87b4f8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2070,19 +2070,13 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
{
struct update_util_data *data;
- data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
+ data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
+ cpu_of(rq)));
if (data)
data->func(data, rq_clock(rq), flags);
}
-
-static inline void cpufreq_update_this_cpu(struct rq *rq, unsigned int flags)
-{
- if (cpu_of(rq) == smp_processor_id())
- cpufreq_update_util(rq, flags);
-}
#else
static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
-static inline void cpufreq_update_this_cpu(struct rq *rq, unsigned int flags) {}
#endif /* CONFIG_CPU_FREQ */
#ifdef arch_scale_freq_capacity
--
2.13.0.71.gd7076ec9c9cb
This patch updates the legacy governors (ondemand/conservative) to
process cpufreq utilization update hooks to be called for remote CPUs.
Proper locking is already in place for shared policies and nothing extra
is required to be done.
Based on initial work from Steve Muckle.
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq_governor.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 606b1a37a1af..0b49fc8bb91d 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -275,8 +275,8 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,
struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
u64 delta_ns, lst;
- /* Don't allow remote callbacks */
- if (smp_processor_id() != data->cpu)
+ /* Allow remote callbacks only on the CPUs sharing cpufreq policy */
+ if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy->cpus))
return;
/*
--
2.13.0.71.gd7076ec9c9cb
This patch updates the schedutil governor to process cpufreq utilization
update hooks called for remote CPUs.
The schedutil governor already has proper locking in place for shared
policy update hooks and nothing extra is required to be done.
Based on initial work from Steve Muckle.
Signed-off-by: Viresh Kumar <[email protected]>
---
kernel/sched/cpufreq_schedutil.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index ed9c589e5386..2599e7e7a82c 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -154,12 +154,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
return cpufreq_driver_resolve_freq(policy, freq);
}
-static void sugov_get_util(unsigned long *util, unsigned long *max)
+static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
{
- struct rq *rq = this_rq();
+ struct rq *rq = cpu_rq(cpu);
unsigned long cfs_max;
- cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());
+ cfs_max = arch_scale_cpu_capacity(NULL, cpu);
*util = min(rq->cfs.avg.util_avg, cfs_max);
*max = cfs_max;
@@ -233,7 +233,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
if (flags & SCHED_CPUFREQ_RT_DL) {
next_f = policy->cpuinfo.max_freq;
} else {
- sugov_get_util(&util, &max);
+ sugov_get_util(&util, &max, hook->cpu);
sugov_iowait_boost(sg_cpu, &util, &max);
next_f = get_next_freq(sg_policy, util, max);
/*
@@ -291,14 +291,15 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
{
struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
+ struct cpufreq_policy *policy = sg_policy->policy;
unsigned long util, max;
unsigned int next_f;
- /* Don't allow remote callbacks */
- if (smp_processor_id() != hook->cpu)
+ /* Allow remote callbacks only on the CPUs sharing cpufreq policy */
+ if (!cpumask_test_cpu(smp_processor_id(), policy->cpus))
return;
- sugov_get_util(&util, &max);
+ sugov_get_util(&util, &max, hook->cpu);
raw_spin_lock(&sg_policy->update_lock);
--
2.13.0.71.gd7076ec9c9cb
On Thu, Jul 13, 2017 at 8:44 AM, Viresh Kumar <[email protected]> wrote:
> Hi,
>
> With Android UI and benchmarks the latency of cpufreq response to
> certain scheduling events can become very critical. Currently, callbacks
> into schedutil are only made from the scheduler if the target CPU of the
> event is the same as the current CPU. This means there are certain
> situations where a target CPU may not run schedutil for some time.
>
> One testcase to show this behavior is where a task starts running on
> CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
> system is configured such that new tasks should receive maximum demand
> initially, this should result in CPU0 increasing frequency immediately.
> Because of the above mentioned limitation though this does not occur.
> This is verified using ftrace with the sample [1] application.
>
> Maybe the ideal solution is to always allow remote callbacks but that
> has its own challenges:
>
> o There is no protection required for single CPU per policy case today,
> and adding any kind of locking there, to supply remote callbacks,
> isn't really a good idea.
>
> o If is local CPU isn't part of the same cpufreq policy as the target
> CPU, then we wouldn't be able to do fast switching at all and have to
> use some kind of bottom half to schedule work on the target CPU to do
> real switching. That may be overkill as well.
>
>
> And so this series only allows remote callbacks for target CPUs that
> share the cpufreq policy with the local CPU.
>
> This series is tested with couple of usecases (Android: hackbench,
> recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey
> board (64 bit octa-core, single policy). Only galleryfling showed minor
> improvements, while others didn't had much deviation.
>
> The reason being that this patchset only targets a corner case, where
> following are required to be true to improve performance and that
> doesn't happen too often with these tests:
>
> - Task is migrated to another CPU.
> - The task has maximum demand initially, and should take the CPU to
> higher OPPs.
> - And the target CPU doesn't call into schedutil until the next tick.
I don't have any problems with this series at this point, so you can add
Acked-by: Rafael J. Wysocki <[email protected]>
to the patches.
I can't apply them without ACKs from Peter or Ingo, though.
Thanks,
Rafael
On 07/12/2017 11:44 PM, Viresh Kumar wrote:
> This patch updates the schedutil governor to process cpufreq utilization
> update hooks called for remote CPUs.
>
> The schedutil governor already has proper locking in place for shared
> policy update hooks and nothing extra is required to be done.
>
> Based on initial work from Steve Muckle.
>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> kernel/sched/cpufreq_schedutil.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index ed9c589e5386..2599e7e7a82c 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -154,12 +154,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> return cpufreq_driver_resolve_freq(policy, freq);
> }
>
> -static void sugov_get_util(unsigned long *util, unsigned long *max)
> +static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
> {
> - struct rq *rq = this_rq();
> + struct rq *rq = cpu_rq(cpu);
> unsigned long cfs_max;
>
> - cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> + cfs_max = arch_scale_cpu_capacity(NULL, cpu);
>
> *util = min(rq->cfs.avg.util_avg, cfs_max);
> *max = cfs_max;
> @@ -233,7 +233,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> if (flags & SCHED_CPUFREQ_RT_DL) {
> next_f = policy->cpuinfo.max_freq;
> } else {
> - sugov_get_util(&util, &max);
> + sugov_get_util(&util, &max, hook->cpu);
> sugov_iowait_boost(sg_cpu, &util, &max);
> next_f = get_next_freq(sg_policy, util, max);
> /*
> @@ -291,14 +291,15 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> {
> struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
> struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> + struct cpufreq_policy *policy = sg_policy->policy;
> unsigned long util, max;
> unsigned int next_f;
>
> - /* Don't allow remote callbacks */
> - if (smp_processor_id() != hook->cpu)
> + /* Allow remote callbacks only on the CPUs sharing cpufreq policy */
> + if (!cpumask_test_cpu(smp_processor_id(), policy->cpus))
> return;
Honestly, this seems like such a chip/platform specific decision.
There's no reason that one can't have a chip where you can change the
frequency of any CPU from any other CPU. If there's such a limitation,
we should let that be handled at the CPU freq driver level instead of
having to know about any of that at the scheduler. Heck, at worst case,
the CPU freq driver can send an IPI and execute that work on the CPU of
interest.
In all Qualcomm chipsets (well, at least the ones that have been used in
Android devices so far), we can switch the frequency of any CPU from any
other CPU. If we can do that even without fast switching, why wouldn't
any theoretical fast switching be incapable of supporting this? Is this
a limitation specific to x86 that we are assuming all architectures and
platforms are going to have?
-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On 13-07-17, 19:02, Saravana Kannan wrote:
> Honestly, this seems like such a chip/platform specific decision. There's no
> reason that one can't have a chip where you can change the frequency of any
> CPU from any other CPU. If there's such a limitation, we should let that be
> handled at the CPU freq driver level instead of having to know about any of
> that at the scheduler. Heck, at worst case, the CPU freq driver can send an
> IPI and execute that work on the CPU of interest.
>
> In all Qualcomm chipsets (well, at least the ones that have been used in
> Android devices so far), we can switch the frequency of any CPU from any
> other CPU. If we can do that even without fast switching, why wouldn't any
> theoretical fast switching be incapable of supporting this? Is this a
> limitation specific to x86 that we are assuming all architectures and
> platforms are going to have?
The default assumption in cpufreq core is that any CPU from a policy
can change freq for that policy. Yes, we surely have cases where any
CPU can change freq of any other CPU (even in different policies).
Perhaps all ARM platforms are like that, not sure.
And so I added a special flag for that in my previous version, but the
idea here is to get a simple solution merged first and then we can
have a separate patch later to support freq switching from all CPUs.
--
viresh
On Thu, Jul 13, 2017 at 07:02:37PM -0700, Saravana Kannan wrote:
> In all Qualcomm chipsets (well, at least the ones that have been used in
> Android devices so far), we can switch the frequency of any CPU from any
> other CPU. If we can do that even without fast switching, why wouldn't any
> theoretical fast switching be incapable of supporting this? Is this a
> limitation specific to x86 that we are assuming all architectures and
> platforms are going to have?
So the typical implementation of fast switching we're thinking of is the
CPU writing the DVFS request into a machine register. Now machine
registers are typically per logical CPU.
What style of fast switching were you thinking of?
On Fri, Jul 14, 2017 at 10:33:20AM +0530, Viresh Kumar wrote:
> On 13-07-17, 19:02, Saravana Kannan wrote:
> > Honestly, this seems like such a chip/platform specific decision. There's no
> > reason that one can't have a chip where you can change the frequency of any
> > CPU from any other CPU. If there's such a limitation, we should let that be
> > handled at the CPU freq driver level instead of having to know about any of
> > that at the scheduler. Heck, at worst case, the CPU freq driver can send an
> > IPI and execute that work on the CPU of interest.
> >
> > In all Qualcomm chipsets (well, at least the ones that have been used in
> > Android devices so far), we can switch the frequency of any CPU from any
> > other CPU. If we can do that even without fast switching, why wouldn't any
> > theoretical fast switching be incapable of supporting this?
I spoke with Sudeep since the last email; and the proposed interface
(SCMI) is a shmem/mailbox one, which can indeed change the frequency of
another CPU.
> > Is this a limitation specific to x86 that we are assuming all
> > architectures and platforms are going to have?
You are right in that x86 cannot do this. Sending IPIs is fairly
expensive though :/
> The default assumption in cpufreq core is that any CPU from a policy
> can change freq for that policy. Yes, we surely have cases where any
> CPU can change freq of any other CPU (even in different policies).
> Perhaps all ARM platforms are like that, not sure.
>
> And so I added a special flag for that in my previous version, but the
> idea here is to get a simple solution merged first and then we can
> have a separate patch later to support freq switching from all CPUs.
I was going to write things about serialization here. How allowing remote
updates would require extra serialization, in part inspired by your
Changelog comment that says cpufreq-shared provides the required
serialization.
But I think that's all nonsense... That is, yes cpufreq-shared needs
additional serialization, but that's not relevant for the problem at
hand.
The thing is, all of this cpufreq_update_*() crud is called with the
relevant rq->lock held. So all those calls are already fully serialized
between CPUs.
On 20/07/17 13:22, Peter Zijlstra wrote:
> On Thu, Jul 13, 2017 at 07:02:37PM -0700, Saravana Kannan wrote:
>> In all Qualcomm chipsets (well, at least the ones that have been used in
>> Android devices so far), we can switch the frequency of any CPU from any
>> other CPU. If we can do that even without fast switching, why wouldn't any
>> theoretical fast switching be incapable of supporting this? Is this a
>> limitation specific to x86 that we are assuming all architectures and
>> platforms are going to have?
>
> So the typical implementation of fast switching we're thinking of is the
> CPU writing the DVFS request into a machine register. Now machine
> registers are typically per logical CPU.
>
But, if ARM decides to architect and move to it to a system/machine
register, we will end up with the same limitation :( IMO.
For now with SCMI kind of interface, there's no such limitation as
yoalready mentioned in the follow up email.
--
Regards,
Sudeep
On Thu, Jul 13, 2017 at 12:14:37PM +0530, Viresh Kumar wrote:
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 47e24b5384b3..606b1a37a1af 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -275,6 +275,10 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,
> struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
> u64 delta_ns, lst;
>
> + /* Don't allow remote callbacks */
> + if (smp_processor_id() != data->cpu)
> + return;
> +
The alternative is using some of that policy_dbs->policy->*cpus crud I
suppose, because:
> /*
> * The work may not be allowed to be queued up right now.
> * Possible reasons:
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index b7fb8b7c980d..4bee2f4cbc28 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1732,6 +1732,10 @@ static void intel_pstate_update_util_pid(struct update_util_data *data,
> struct cpudata *cpu = container_of(data, struct cpudata, update_util);
> u64 delta_ns = time - cpu->sample.time;
>
> + /* Don't allow remote callbacks */
> + if (smp_processor_id() != data->cpu)
> + return;
> +
> if ((s64)delta_ns < pid_params.sample_rate_ns)
> return;
>
> @@ -1749,6 +1753,10 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time,
> struct cpudata *cpu = container_of(data, struct cpudata, update_util);
> u64 delta_ns;
>
> + /* Don't allow remote callbacks */
> + if (smp_processor_id() != data->cpu)
> + return;
> +
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> cpu->iowait_boost = int_tofp(1);
> } else if (cpu->iowait_boost) {
For these we can already use cpu->cpu, which would make:
> diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
> index d2be2ccbb372..8256a8f35f22 100644
> --- a/include/linux/sched/cpufreq.h
> +++ b/include/linux/sched/cpufreq.h
> @@ -16,6 +16,7 @@
> #ifdef CONFIG_CPU_FREQ
> struct update_util_data {
> void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
> + unsigned int cpu;
> };
>
> void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
> diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
> index dbc51442ecbc..ee4c596b71b4 100644
> --- a/kernel/sched/cpufreq.c
> +++ b/kernel/sched/cpufreq.c
> @@ -42,6 +42,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
> return;
>
> data->func = func;
> + data->cpu = cpu;
> rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
> }
> EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook);
redundant.
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 29a397067ffa..ed9c589e5386 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> unsigned int next_f;
> bool busy;
>
> + /* Remote callbacks aren't allowed for policies which aren't shared */
> + if (smp_processor_id() != hook->cpu)
> + return;
> +
> sugov_set_iowait_boost(sg_cpu, time, flags);
> sg_cpu->last_update = time;
>
> @@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> unsigned long util, max;
> unsigned int next_f;
>
> + /* Don't allow remote callbacks */
> + if (smp_processor_id() != hook->cpu)
> + return;
> +
> sugov_get_util(&util, &max);
>
> raw_spin_lock(&sg_policy->update_lock);
Given the whole rq->lock thing, I suspect we could actually not do these
two. That would then continue to process the iowait and other accounting
stuff, but stall the moment we call into the actual driver, which will
then drop the request on the floor as per the first few hunks.
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index a84299f44b5d..7fcfaee39d19 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1136,7 +1136,7 @@ static void update_curr_dl(struct rq *rq)
> }
>
> /* kick cpufreq (see the comment in kernel/sched/sched.h). */
> - cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_DL);
> + cpufreq_update_util(rq, SCHED_CPUFREQ_DL);
>
> schedstat_set(curr->se.statistics.exec_max,
> max(curr->se.statistics.exec_max, delta_exec));
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c95880e216f6..d378d02fdfcb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3278,7 +3278,9 @@ static inline void set_tg_cfs_propagate(struct cfs_rq *cfs_rq) {}
>
> static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
> {
> - if (&this_rq()->cfs == cfs_rq) {
> + struct rq *rq = rq_of(cfs_rq);
> +
> + if (&rq->cfs == cfs_rq) {
> /*
> * There are a few boundary cases this might miss but it should
> * get called often enough that that should (hopefully) not be
> @@ -3295,7 +3297,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
> *
> * See cpu_util().
> */
> - cpufreq_update_util(rq_of(cfs_rq), 0);
> + cpufreq_update_util(rq, 0);
> }
> }
>
> @@ -4875,7 +4877,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> * passed.
> */
> if (p->in_iowait)
> - cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_IOWAIT);
> + cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
>
> for_each_sched_entity(se) {
> if (se->on_rq)
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 45caf937ef90..0af5ca9e3e3f 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -970,7 +970,7 @@ static void update_curr_rt(struct rq *rq)
> return;
>
> /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> - cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_RT);
> + cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
>
> schedstat_set(curr->se.statistics.exec_max,
> max(curr->se.statistics.exec_max, delta_exec));
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index eeef1a3086d1..aa9d5b87b4f8 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2070,19 +2070,13 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
> {
> struct update_util_data *data;
>
> - data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
> + data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> + cpu_of(rq)));
> if (data)
> data->func(data, rq_clock(rq), flags);
> }
> -
> -static inline void cpufreq_update_this_cpu(struct rq *rq, unsigned int flags)
> -{
> - if (cpu_of(rq) == smp_processor_id())
> - cpufreq_update_util(rq, flags);
> -}
> #else
> static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> -static inline void cpufreq_update_this_cpu(struct rq *rq, unsigned int flags) {}
> #endif /* CONFIG_CPU_FREQ */
This seems ok. Except of course you'll have conflicts with Juri's patch
set, but that should be trivial to sort out.
On 21-07-17, 15:03, Peter Zijlstra wrote:
> On Thu, Jul 13, 2017 at 12:14:37PM +0530, Viresh Kumar wrote:
> > @@ -42,6 +42,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
> > return;
> >
> > data->func = func;
> > + data->cpu = cpu;
> > rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
> > }
> > EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook);
>
> redundant.
Actually we will still need it. We pass hook->cpu to sugov_get_util()
in the 2nd patch of this series and there is no work around possible
around that.
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 29a397067ffa..ed9c589e5386 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > unsigned int next_f;
> > bool busy;
> >
> > + /* Remote callbacks aren't allowed for policies which aren't shared */
> > + if (smp_processor_id() != hook->cpu)
> > + return;
> > +
> > sugov_set_iowait_boost(sg_cpu, time, flags);
> > sg_cpu->last_update = time;
> >
> > @@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > unsigned long util, max;
> > unsigned int next_f;
> >
> > + /* Don't allow remote callbacks */
> > + if (smp_processor_id() != hook->cpu)
> > + return;
> > +
> > sugov_get_util(&util, &max);
> >
> > raw_spin_lock(&sg_policy->update_lock);
>
>
> Given the whole rq->lock thing, I suspect we could actually not do these
> two.
You meant sugov_get_util() and raw_spin_lock()? Why?
The locking is required here in the shared-policy case to make sure
only one CPU is updating the frequency for the entire policy. And we
can't really avoid that even with the rq->lock guarantees from the
scheduler for the target CPU.
> That would then continue to process the iowait and other accounting
> stuff, but stall the moment we call into the actual driver, which will
> then drop the request on the floor as per the first few hunks.
I am not sure I understood your comment completely though.
> This seems ok. Except of course you'll have conflicts with Juri's patch
> set, but that should be trivial to sort out.
Yeah, I wouldn't mind rebasing if his series gets in first.
--
viresh
On Mon, Jul 24, 2017 at 04:31:22PM +0530, Viresh Kumar wrote:
> On 21-07-17, 15:03, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2017 at 12:14:37PM +0530, Viresh Kumar wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 29a397067ffa..ed9c589e5386 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -218,6 +218,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> > > unsigned int next_f;
> > > bool busy;
> > >
> > > + /* Remote callbacks aren't allowed for policies which aren't shared */
> > > + if (smp_processor_id() != hook->cpu)
> > > + return;
> > > +
> > > sugov_set_iowait_boost(sg_cpu, time, flags);
> > > sg_cpu->last_update = time;
> > >
> > > @@ -290,6 +294,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > > unsigned long util, max;
> > > unsigned int next_f;
> > >
> > > + /* Don't allow remote callbacks */
> > > + if (smp_processor_id() != hook->cpu)
> > > + return;
> > > +
> > > sugov_get_util(&util, &max);
> > >
> > > raw_spin_lock(&sg_policy->update_lock);
> >
> >
> > Given the whole rq->lock thing, I suspect we could actually not do these
> > two.
>
> You meant sugov_get_util() and raw_spin_lock()? Why?
>
> The locking is required here in the shared-policy case to make sure
> only one CPU is updating the frequency for the entire policy. And we
> can't really avoid that even with the rq->lock guarantees from the
> scheduler for the target CPU.
I said nothing about the shared locking. That is indeed required. All I
said is that those two tests you add could be left out.
> > That would then continue to process the iowait and other accounting
> > stuff, but stall the moment we call into the actual driver, which will
> > then drop the request on the floor as per the first few hunks.
>
> I am not sure I understood your comment completely though.
Since we call cpufreq_update_util(@rq, ...) with @rq->lock held, all
such calls are in fact serialized for that cpu. Therefore the cpu !=
current_cpu test you add are pointless.
Only once we get to the actual cpufreq driver (intel_pstate and others)
do we run into the fact that we might not be able to service the request
remotely. But since you also add a test there, that is sufficient.
On 24-07-17, 15:47, Peter Zijlstra wrote:
> I said nothing about the shared locking. That is indeed required. All I
> said is that those two tests you add could be left out.
I was right, I didn't understood your comment at all :(
> > > That would then continue to process the iowait and other accounting
> > > stuff, but stall the moment we call into the actual driver, which will
> > > then drop the request on the floor as per the first few hunks.
> >
> > I am not sure I understood your comment completely though.
>
> Since we call cpufreq_update_util(@rq, ...) with @rq->lock held, all
> such calls are in fact serialized for that cpu.
Yes, they are serialized but ..
> Therefore the cpu !=
> current_cpu test you add are pointless.
.. I didn't understand why you said so. This check isn't there to take
care of serialization but remote callbacks.
> Only once we get to the actual cpufreq driver (intel_pstate and others)
> do we run into the fact that we might not be able to service the request
> remotely.
We never check for remote callbacks in drivers.
> But since you also add a test there, that is sufficient.
No.
The diff for intel-pstate that you saw in this patch was for the case
where intel-pstate works directly with the scheduler (i.e. no
schedutil governor). The routine that gets called with schedutil is
intel_cpufreq_target(), which doesn't check for remoteness at all.
--
viresh
On Wed, Jul 26, 2017 at 11:59:12AM +0530, Viresh Kumar wrote:
> On 24-07-17, 15:47, Peter Zijlstra wrote:
> > I said nothing about the shared locking. That is indeed required. All I
> > said is that those two tests you add could be left out.
>
> I was right, I didn't understood your comment at all :(
>
> > > > That would then continue to process the iowait and other accounting
> > > > stuff, but stall the moment we call into the actual driver, which will
> > > > then drop the request on the floor as per the first few hunks.
> > >
> > > I am not sure I understood your comment completely though.
> >
> > Since we call cpufreq_update_util(@rq, ...) with @rq->lock held, all
> > such calls are in fact serialized for that cpu.
>
> Yes, they are serialized but ..
>
> > Therefore the cpu !=
> > current_cpu test you add are pointless.
>
> .. I didn't understand why you said so. This check isn't there to take
> care of serialization but remote callbacks.
>
> > Only once we get to the actual cpufreq driver (intel_pstate and others)
> > do we run into the fact that we might not be able to service the request
> > remotely.
>
> We never check for remote callbacks in drivers.
>
> > But since you also add a test there, that is sufficient.
>
> No.
>
> The diff for intel-pstate that you saw in this patch was for the case
> where intel-pstate works directly with the scheduler (i.e. no
> schedutil governor). The routine that gets called with schedutil is
> intel_cpufreq_target(), which doesn't check for remoteness at all.
Argh, what a horrible mess.. :-(
On Wednesday, July 26, 2017 11:59:12 AM Viresh Kumar wrote:
> On 24-07-17, 15:47, Peter Zijlstra wrote:
> > I said nothing about the shared locking. That is indeed required. All I
> > said is that those two tests you add could be left out.
>
> I was right, I didn't understood your comment at all :(
>
> > > > That would then continue to process the iowait and other accounting
> > > > stuff, but stall the moment we call into the actual driver, which will
> > > > then drop the request on the floor as per the first few hunks.
> > >
> > > I am not sure I understood your comment completely though.
> >
> > Since we call cpufreq_update_util(@rq, ...) with @rq->lock held, all
> > such calls are in fact serialized for that cpu.
>
> Yes, they are serialized but ..
>
> > Therefore the cpu !=
> > current_cpu test you add are pointless.
>
> .. I didn't understand why you said so. This check isn't there to take
> care of serialization but remote callbacks.
>
> > Only once we get to the actual cpufreq driver (intel_pstate and others)
> > do we run into the fact that we might not be able to service the request
> > remotely.
>
> We never check for remote callbacks in drivers.
>
> > But since you also add a test there, that is sufficient.
>
> No.
>
> The diff for intel-pstate that you saw in this patch was for the case
> where intel-pstate works directly with the scheduler (i.e. no
> schedutil governor). The routine that gets called with schedutil is
> intel_cpufreq_target(), which doesn't check for remoteness at all.
And of course acpi-cpufreq doesn't check for that too, for example.
Thanks,
Rafael
On 07/20/2017 05:22 AM, Peter Zijlstra wrote:
> So the typical implementation of fast switching we're thinking of is the
> CPU writing the DVFS request into a machine register. Now machine
> registers are typically per logical CPU.
Writing to a memory addressable register. AFAIK, ARM has no support for
a machine register for DVFS request. So, even if any ARM licensee wants
to add one, it won't be possible.
Also, even if we have an ARM CPU with a machine register, rejecting a
valid frequency switch just because it happened to come on a different
CPU seem silly (you can have a huge performance hit due to that). A much
better solution is to just make an IPI to the right CPU and execute the
machine register write on the right CPU.
-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On 07/21/2017 06:03 AM, Peter Zijlstra wrote:
> On Thu, Jul 13, 2017 at 12:14:37PM +0530, Viresh Kumar wrote:
>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
>> index 47e24b5384b3..606b1a37a1af 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -275,6 +275,10 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,
>> struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
>> u64 delta_ns, lst;
>>
>> + /* Don't allow remote callbacks */
>> + if (smp_processor_id() != data->cpu)
>> + return;
>> +
>
> The alternative is using some of that policy_dbs->policy->*cpus crud I
> suppose, because:
No, the alternative is to pass it on to the CPU freq driver and let it
decide what it wants to do. That's the whole point if having a CPU freq
driver -- so that the generic code doesn't need to care about HW
specific details. Which is the point I was making in an earlier email to
Viresh's patch -- we shouldn't be doing any CPU check for the call backs
at the scheduler or ever governor level.
That would simplify this whole thing by deleting a bunch of code. And
having much simpler checks in those drivers that actually have to deal
with their HW specific details.
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On 26-07-17, 14:00, Saravana Kannan wrote:
> No, the alternative is to pass it on to the CPU freq driver and let it
> decide what it wants to do. That's the whole point if having a CPU freq
> driver -- so that the generic code doesn't need to care about HW specific
> details. Which is the point I was making in an earlier email to Viresh's
> patch -- we shouldn't be doing any CPU check for the call backs at the
> scheduler or ever governor level.
>
> That would simplify this whole thing by deleting a bunch of code. And having
> much simpler checks in those drivers that actually have to deal with their
> HW specific details.
So what you are saying is that we go and update (almost) every cpufreq
driver we have today and make their ->target() callbacks return early
if they don't support switching frequency remotely ? Is that really
simplifying anything?
The core already has most of the data required and I believe that we
need to handle it in the governor's code as is handled in this series.
To solve the problem that you have been reporting (update from any
CPU), we need something like this which I earlier suggested and I
will come back to it after this series is gone. Don't want to
complicate things here unnecessarily.
https://marc.info/?l=linux-kernel&m=148906012827786&w=2
--
viresh
On 07/26/2017 08:30 PM, Viresh Kumar wrote:
> On 26-07-17, 14:00, Saravana Kannan wrote:
>> No, the alternative is to pass it on to the CPU freq driver and let it
>> decide what it wants to do. That's the whole point if having a CPU freq
>> driver -- so that the generic code doesn't need to care about HW specific
>> details. Which is the point I was making in an earlier email to Viresh's
>> patch -- we shouldn't be doing any CPU check for the call backs at the
>> scheduler or ever governor level.
>>
>> That would simplify this whole thing by deleting a bunch of code. And having
>> much simpler checks in those drivers that actually have to deal with their
>> HW specific details.
>
> So what you are saying is that we go and update (almost) every cpufreq
> driver we have today and make their ->target() callbacks return early
> if they don't support switching frequency remotely ? Is that really
> simplifying anything?
Yes. Simplifying isn't always about number of lines of code. It's also
about abstraction. Having generic scheduler code care about HW details
doesn't seem nice.
It'll literally one simple check (cpu == smp_processor_id()) or (cpu
"in" policy->cpus).
Also, this is only for drivers that currently support fast switching.
How many of those do you have?
> The core already has most of the data required and I believe that we
> need to handle it in the governor's code as is handled in this series.
Clearly, it doesn't. You are just making assumptions about HW.
> To solve the problem that you have been reporting (update from any
> CPU), we need something like this which I earlier suggested and I
> will come back to it after this series is gone. Don't want to
> complicate things here unnecessarily.
>
> https://marc.info/?l=linux-kernel&m=148906012827786&w=2
I'm okay with handling it later. I'm just saying that if we are going to
go back and debate the CPU check, then maybe it's better do it in one
series.
-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Thu, Jul 27, 2017 at 12:55 PM, Saravana Kannan
<[email protected]> wrote:
> On 07/26/2017 08:30 PM, Viresh Kumar wrote:
>>
>> On 26-07-17, 14:00, Saravana Kannan wrote:
>>>
>>> No, the alternative is to pass it on to the CPU freq driver and let it
>>> decide what it wants to do. That's the whole point if having a CPU freq
>>> driver -- so that the generic code doesn't need to care about HW specific
>>> details. Which is the point I was making in an earlier email to Viresh's
>>> patch -- we shouldn't be doing any CPU check for the call backs at the
>>> scheduler or ever governor level.
>>>
>>> That would simplify this whole thing by deleting a bunch of code. And
>>> having
>>> much simpler checks in those drivers that actually have to deal with
>>> their
>>> HW specific details.
>>
>>
>> So what you are saying is that we go and update (almost) every cpufreq
>> driver we have today and make their ->target() callbacks return early
>> if they don't support switching frequency remotely ? Is that really
>> simplifying anything?
>
>
> Yes. Simplifying isn't always about number of lines of code. It's also about
> abstraction. Having generic scheduler code care about HW details doesn't
> seem nice.
>
> It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in"
> policy->cpus).
>
I think we can have both approaches? So we query the driver some time
around sugov_should_update_freq (with a new driver callback?) and ask
it if it has any say over the default behavior of "can't update remote
CPU if I'm not a part of its policy" and use that over the default if
it hasn't defined it in their struct cpufreq_driver.
I think this will also not have the concern of "updating every
driver", then we can just stick to the sane default of "no" for
drivers that haven't defined it. Probably Viresh has already thought
about this, but I just thought of bringing it up anyway. I also think
its fine to handle this case after this series gets in, but that's
just my opinion.
thanks!
-Joel
On 27-07-17, 12:55, Saravana Kannan wrote:
> Yes. Simplifying isn't always about number of lines of code. It's also about
> abstraction. Having generic scheduler code care about HW details doesn't
> seem nice.
I can argue that even the policy->cpus field is also hardware
specific, isn't it ? And we are using that in the schedutil governor
anyway. What's wrong with having another field (in a generic way) in
the same structure that tells us more about hardware ?
And then schedutil isn't really scheduler, but a cpufreq governor.
Just like ondemand/conservative, which are also called from the same
scheduler path.
> It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in"
> policy->cpus).
>
> Also, this is only for drivers that currently support fast switching. How
> many of those do you have?
Why? Why shouldn't we do that for the other drivers? I think it should
be done across everyone.
> >The core already has most of the data required and I believe that we
> >need to handle it in the governor's code as is handled in this series.
>
> Clearly, it doesn't. You are just making assumptions about HW.
So assuming that any CPU from a policy can change freq on behalf of
all the CPUs of the same policy is wrong? That is the basis of how the
cpufreq core is designed.
--
viresh
On 07/27/2017 11:00 PM, Viresh Kumar wrote:
> On 27-07-17, 12:55, Saravana Kannan wrote:
>> Yes. Simplifying isn't always about number of lines of code. It's also about
>> abstraction. Having generic scheduler code care about HW details doesn't
>> seem nice.
>
> I can argue that even the policy->cpus field is also hardware
> specific, isn't it ?
Yes.
> And we are using that in the schedutil governor
> anyway.
Yes
> What's wrong with having another field (in a generic way) in
> the same structure that tells us more about hardware ?
Nothing wrong. I'm not saying you shouldn't have the cpu field in the
data or as a parameter to the hook. You'll definitely need that.
> And then schedutil isn't really scheduler, but a cpufreq governor.
> Just like ondemand/conservative, which are also called from the same
> scheduler path.
Exactly. I never debated anything about schedutil. I'm just saying don't
have any CPU limitations or check on the scheduler side when sending
notification. Scheduler shouldn't have to know/care of the driver can
only set the freq on that CPU or across CPUs in a cluster or across the
entire system.
>> It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in"
>> policy->cpus).
>>
>> Also, this is only for drivers that currently support fast switching. How
>> many of those do you have?
>
> Why? Why shouldn't we do that for the other drivers? I think it should
> be done across everyone.
Because if I remember it right, the "don't send the notification if it's
not the same CPU" limitation is only for the fast switching case? I
might be mistaken about this part though.
>>> The core already has most of the data required and I believe that we
>>> need to handle it in the governor's code as is handled in this series.
>>
>> Clearly, it doesn't. You are just making assumptions about HW.
>
> So assuming that any CPU from a policy can change freq on behalf of
> all the CPUs of the same policy is wrong? That is the basis of how the
> cpufreq core is designed.
1. I'm not saying that. I'm saying assuming CPUs can change the freq
only on behalf of all the CPUs in the same policy is wrong. Again, the
scheduler or governor shouldn't even be making any of that assumption.
That's a CPUfreq driver problem.
2. No, that is not the basis of the entire cpufreq core design. None of
the existing CPUfreq code has any assumptions that only CPUs in a policy
can change their frequency. It doesn't break in any way in system where
any CPU can change any other CPU's frequency -- all Qualcomm chips are
like that. It's only the recent scheduler notifier changes that are
adding this additional limitation and breaking stuff for systems where
any CPU can change any other CPU's frequency.
-Saravana
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On 28-07-17, 14:05, Saravana Kannan wrote:
> 1. I'm not saying that. I'm saying assuming CPUs can change the freq only on
> behalf of all the CPUs in the same policy is wrong. Again, the scheduler or
> governor shouldn't even be making any of that assumption. That's a CPUfreq
> driver problem.
>
> 2. No, that is not the basis of the entire cpufreq core design. None of the
> existing CPUfreq code has any assumptions that only CPUs in a policy can
> change their frequency. It doesn't break in any way in system where any CPU
> can change any other CPU's frequency -- all Qualcomm chips are like that.
> It's only the recent scheduler notifier changes that are adding this
> additional limitation and breaking stuff for systems where any CPU can
> change any other CPU's frequency.
Can you please have a look at V5 and see f the solution proposed there would be
fine ?
--
viresh