2017-03-24 14:09:27

by Juri Lelli

[permalink] [raw]
Subject: [RFD PATCH 0/5] SCHED_DEADLINE freq/cpu invariance and OPP selection

Hi,

this is a very exploratory set implementing frequency/cpu invariance and OPP
selection for SCHED_DEADLINE. The set has been slightly tested on a Juno
platform. While the actual implementation is very premature, I'm posting this
early to facilitate discussion at OSPM-summit [1].

Results of the testing, highlighting why these features are useful are
available here:

- without the set
https://gist.github.com/a6e3ee99cec32e00cc537b53cd3d54d2

- with the set
https://gist.github.com/1f7d485fc3ce9234fe627dcb53b2935c

The set is based on tip/sched/core as of today (bc4278987e38) plus a couple of
schedutil fixes coming from linux-pm/linux-next and Luca's "CPU reclaiming for
SCHED_DEADLINE" v5 [2].

Patches high level description:

o [01-02]/05 add the necessary links to start accounting DEADLINE contribution
to OPP selection
o 03/05 it's an hack to make possible (on ARM) to change frequency for
DEADLINE tasks (that would possibly delay the SCHED_FIFO worker
kthread); suggestions on how to do this properly are very welcome
o 04/05 it's a schedutil change that copes with the fact that DEADLINE
doesn't require periodic OPP selection triggering point
o 05/05 implements frequency/cpu invariance for tasks' reservation
parameters*; which basically means that we implement GRUB-PA [3]

Please have a look. Feedback on how we want to shape this is the sole purpose
of this posting.

In case you would like to test this out:

git://linux-arm.org/linux-jl.git upstream/deadline/freq-rfd

Best,

- Juri

[1] http://retis.sssup.it/ospm-summit/index.html
[2] https://marc.info/?l=linux-kernel&m=149029880524038
[3] C. Scordino, G. Lipari, A Resource Reservation Algorithm for Power-Aware
Scheduling of Periodic and Aperiodic Real-Time Tasks, IEEE Transactions
on Computers, December 2006.

* Notice that this currently breaks !CONFIG_SMP, as arch_scale_{freq,cpu}
_capacity and SCHED_CAPACITY_SCALE are only defined on CONFIG_SMP. Fixing
this particular issue is straightforward, but we should probably look into
making frequency scaling (and PELT averages) available on !CONFIG_SMP as well
(so that schedutil can work on such configurations for example). Since this is
only an RFD and since a proper rework might be non trivial, I decided to leave
it out of scope for the time being.

Juri Lelli (5):
sched/cpufreq_schedutil: make use of DEADLINE utilization signal
sched/deadline: move cpu frequency selection triggering points
sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
sched/cpufreq_schedutil: always consider all CPUs when deciding next
freq
sched/deadline: make bandwidth enforcement scale-invariant

include/linux/sched.h | 1 +
include/linux/sched/cpufreq.h | 2 --
include/uapi/linux/sched.h | 1 +
kernel/sched/core.c | 19 +++++++++++++++++--
kernel/sched/cpufreq_schedutil.c | 37 ++++++++++++++++++++++---------------
kernel/sched/deadline.c | 40 +++++++++++++++++++++++++++++++++-------
kernel/sched/fair.c | 2 --
kernel/sched/sched.h | 10 +++++++++-
8 files changed, 83 insertions(+), 29 deletions(-)

--
2.10.0


2017-03-24 14:09:38

by Juri Lelli

[permalink] [raw]
Subject: [RFD PATCH 2/5] sched/deadline: move cpu frequency selection triggering points

Since SCHED_DEADLINE doesn't track utilization signal (but reserves a
fraction of CPU bandwidth to tasks admitted to the system), there is no
point in evaluating frequency changes during each tick event.

Move frequency selection triggering points to where running_bw changes.

Co-authored-by: Claudio Scordino <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Luca Abeni <[email protected]>
---
kernel/sched/deadline.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 55471016d73c..5c1a205e830f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -52,6 +52,8 @@ void add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
dl_rq->running_bw += dl_bw;
SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+ /* kick cpufreq (see the comment in kernel/sched/sched.h). */
+ cpufreq_update_this_cpu(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
}

static inline
@@ -64,6 +66,8 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
if (dl_rq->running_bw > old)
dl_rq->running_bw = 0;
+ /* kick cpufreq (see the comment in kernel/sched/sched.h). */
+ cpufreq_update_this_cpu(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
}

static inline
@@ -953,9 +957,6 @@ static void update_curr_dl(struct rq *rq)
return;
}

- /* kick cpufreq (see the comment in kernel/sched/sched.h). */
- cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_DL);
-
schedstat_set(curr->se.statistics.exec_max,
max(curr->se.statistics.exec_max, delta_exec));

--
2.10.0

2017-03-24 14:09:49

by Juri Lelli

[permalink] [raw]
Subject: [RFD PATCH 1/5] sched/cpufreq_schedutil: make use of DEADLINE utilization signal

SCHED_DEADLINE tracks active utilization signal with a per rq variable
named running_bw.

Make use of that to drive cpu frequency selection: add up FAIR and
DEADLINE contribution to get the required CPU capacity to handle both
requirements.

Co-authored-by: Claudio Scordino <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Luca Abeni <[email protected]>
---
include/linux/sched/cpufreq.h | 2 --
kernel/sched/cpufreq_schedutil.c | 13 ++++++-------
2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d2be2ccbb372..39640bb3a8ee 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -11,8 +11,6 @@
#define SCHED_CPUFREQ_DL (1U << 1)
#define SCHED_CPUFREQ_IOWAIT (1U << 2)

-#define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
-
#ifdef CONFIG_CPU_FREQ
struct update_util_data {
void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index f5ffe241812e..05f5625ea005 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -154,12 +154,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
static void sugov_get_util(unsigned long *util, unsigned long *max)
{
struct rq *rq = this_rq();
- unsigned long cfs_max;
+ unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;

- cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());
+ *max = arch_scale_cpu_capacity(NULL, smp_processor_id());

- *util = min(rq->cfs.avg.util_avg, cfs_max);
- *max = cfs_max;
+ *util = min(rq->cfs.avg.util_avg + dl_util, *max);
}

static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
@@ -207,7 +206,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
if (!sugov_should_update_freq(sg_policy, time))
return;

- if (flags & SCHED_CPUFREQ_RT_DL) {
+ if (flags & SCHED_CPUFREQ_RT) {
next_f = policy->cpuinfo.max_freq;
} else {
sugov_get_util(&util, &max);
@@ -242,7 +241,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
j_sg_cpu->iowait_boost = 0;
continue;
}
- if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
+ if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
return policy->cpuinfo.max_freq;

j_util = j_sg_cpu->util;
@@ -278,7 +277,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
sg_cpu->last_update = time;

if (sugov_should_update_freq(sg_policy, time)) {
- if (flags & SCHED_CPUFREQ_RT_DL)
+ if (flags & SCHED_CPUFREQ_RT)
next_f = sg_policy->policy->cpuinfo.max_freq;
else
next_f = sugov_next_freq_shared(sg_cpu);
--
2.10.0

2017-03-24 14:10:12

by Juri Lelli

[permalink] [raw]
Subject: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

Worker kthread needs to be able to change frequency for all other
threads.

Make it special, just under STOP class.

Signed-off-by: Juri Lelli <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Luca Abeni <[email protected]>
Cc: Claudio Scordino <[email protected]>
---
include/linux/sched.h | 1 +
include/uapi/linux/sched.h | 1 +
kernel/sched/core.c | 19 +++++++++++++++++--
kernel/sched/cpufreq_schedutil.c | 15 ++++++++++++---
kernel/sched/deadline.c | 6 ++++++
kernel/sched/sched.h | 8 +++++++-
6 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 952cac87e433..6f508980f320 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1351,6 +1351,7 @@ extern int idle_cpu(int cpu);
extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
extern int sched_setattr(struct task_struct *, const struct sched_attr *);
+extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
extern struct task_struct *idle_task(int cpu);

/**
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index e2a6c7b3510b..72723859ef74 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -48,5 +48,6 @@
*/
#define SCHED_FLAG_RESET_ON_FORK 0x01
#define SCHED_FLAG_RECLAIM 0x02
+#define SCHED_FLAG_SPECIAL 0x04

#endif /* _UAPI_LINUX_SCHED_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 378d402ee7a6..9b211c77cb54 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2495,6 +2495,9 @@ static int dl_overflow(struct task_struct *p, int policy,
u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
int cpus, err = -1;

+ if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+ return 0;
+
/* !deadline task may carry old deadline bandwidth */
if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
return 0;
@@ -4052,6 +4055,10 @@ __getparam_dl(struct task_struct *p, struct sched_attr *attr)
static bool
__checkparam_dl(const struct sched_attr *attr)
{
+ /* special dl tasks don't actually use any parameter */
+ if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+ return true;
+
/* deadline != 0 */
if (attr->sched_deadline == 0)
return false;
@@ -4138,7 +4145,9 @@ static int __sched_setscheduler(struct task_struct *p,
}

if (attr->sched_flags &
- ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
+ ~(SCHED_FLAG_RESET_ON_FORK |
+ SCHED_FLAG_RECLAIM |
+ SCHED_FLAG_SPECIAL))
return -EINVAL;

/*
@@ -4260,7 +4269,8 @@ static int __sched_setscheduler(struct task_struct *p,
}
#endif
#ifdef CONFIG_SMP
- if (dl_bandwidth_enabled() && dl_policy(policy)) {
+ if (dl_bandwidth_enabled() && dl_policy(policy) &&
+ !(attr->sched_flags & SCHED_FLAG_SPECIAL)) {
cpumask_t *span = rq->rd->span;

/*
@@ -4390,6 +4400,11 @@ int sched_setattr(struct task_struct *p, const struct sched_attr *attr)
}
EXPORT_SYMBOL_GPL(sched_setattr);

+int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
+{
+ return __sched_setscheduler(p, attr, false, true);
+}
+
/**
* sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
* @p: the task in question.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 05f5625ea005..da67a1cf91e7 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -394,7 +394,16 @@ static void sugov_policy_free(struct sugov_policy *sg_policy)
static int sugov_kthread_create(struct sugov_policy *sg_policy)
{
struct task_struct *thread;
- struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
+ struct sched_attr attr = {
+ .size = sizeof(struct sched_attr),
+ .sched_policy = SCHED_DEADLINE,
+ .sched_flags = SCHED_FLAG_SPECIAL,
+ .sched_nice = 0,
+ .sched_priority = 0,
+ .sched_runtime = 0,
+ .sched_deadline = 0,
+ .sched_period = 0,
+ };
struct cpufreq_policy *policy = sg_policy->policy;
int ret;

@@ -412,10 +421,10 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)
return PTR_ERR(thread);
}

- ret = sched_setscheduler_nocheck(thread, SCHED_FIFO, &param);
+ ret = sched_setattr_nocheck(thread, &attr);
if (ret) {
kthread_stop(thread);
- pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
+ pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
return ret;
}

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5c1a205e830f..853de524c6c6 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -131,6 +131,9 @@ static void task_non_contending(struct task_struct *p)
if (dl_se->dl_runtime == 0)
return;

+ if (dl_entity_is_special(dl_se))
+ return;
+
WARN_ON(hrtimer_active(&dl_se->inactive_timer));
WARN_ON(dl_se->dl_non_contending);

@@ -968,6 +971,9 @@ static void update_curr_dl(struct rq *rq)

sched_rt_avg_update(rq, delta_exec);

+ if (unlikely(dl_entity_is_special(dl_se)))
+ return;
+
if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
delta_exec = grub_reclaim(delta_exec, rq, curr->dl.dl_bw);
dl_se->runtime -= delta_exec;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 93c24528ceb6..7b5e81120813 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -155,13 +155,19 @@ static inline int task_has_dl_policy(struct task_struct *p)
return dl_policy(p->policy);
}

+static inline int dl_entity_is_special(struct sched_dl_entity *dl_se)
+{
+ return dl_se->flags & SCHED_FLAG_SPECIAL;
+}
+
/*
* Tells if entity @a should preempt entity @b.
*/
static inline bool
dl_entity_preempt(struct sched_dl_entity *a, struct sched_dl_entity *b)
{
- return dl_time_before(a->deadline, b->deadline);
+ return dl_entity_is_special(a) ||
+ dl_time_before(a->deadline, b->deadline);
}

/*
--
2.10.0

2017-03-24 14:10:17

by Juri Lelli

[permalink] [raw]
Subject: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

No assumption can be made upon the rate at which frequency updates get
triggered, as there are scheduling policies (like SCHED_DEADLINE) which
don't trigger them so frequently.

Remove such assumption from the code.

Signed-off-by: Juri Lelli <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Luca Abeni <[email protected]>
Cc: Claudio Scordino <[email protected]>
---
kernel/sched/cpufreq_schedutil.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index da67a1cf91e7..40f30373b709 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -233,14 +233,13 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
* If the CPU utilization was last updated before the previous
* frequency update and the time elapsed between the last update
* of the CPU utilization and the last frequency update is long
- * enough, don't take the CPU into account as it probably is
- * idle now (and clear iowait_boost for it).
+ * enough, reset iowait_boost, as it probably is not boosted
+ * anymore now.
*/
delta_ns = last_freq_update_time - j_sg_cpu->last_update;
- if (delta_ns > TICK_NSEC) {
+ if (delta_ns > TICK_NSEC)
j_sg_cpu->iowait_boost = 0;
- continue;
- }
+
if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
return policy->cpuinfo.max_freq;

--
2.10.0

2017-03-24 14:10:25

by Juri Lelli

[permalink] [raw]
Subject: [RFD PATCH 5/5] sched/deadline: make bandwidth enforcement scale-invariant

Apply frequency and cpu scale-invariance correction factor to bandwidth
enforcement (similar to what we already do to fair utilization tracking).

Each delta_exec gets scaled considering current frequency and maximum
cpu capacity; which means that the reservation runtime parameter (that
need to be specified profiling the task execution at max frequency on
biggest capacity core) gets thus scaled accordingly.

Signed-off-by: Juri Lelli <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Luca Abeni <[email protected]>
Cc: Claudio Scordino <[email protected]>
---
kernel/sched/deadline.c | 27 +++++++++++++++++++++++----
kernel/sched/fair.c | 2 --
kernel/sched/sched.h | 2 ++
3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 853de524c6c6..7141d6f51ee0 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -940,7 +940,9 @@ static void update_curr_dl(struct rq *rq)
{
struct task_struct *curr = rq->curr;
struct sched_dl_entity *dl_se = &curr->dl;
- u64 delta_exec;
+ u64 delta_exec, scaled_delta_exec;
+ unsigned long scale_freq, scale_cpu;
+ int cpu = cpu_of(rq);

if (!dl_task(curr) || !on_dl_rq(dl_se))
return;
@@ -974,9 +976,26 @@ static void update_curr_dl(struct rq *rq)
if (unlikely(dl_entity_is_special(dl_se)))
return;

- if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
- delta_exec = grub_reclaim(delta_exec, rq, curr->dl.dl_bw);
- dl_se->runtime -= delta_exec;
+ /*
+ * XXX When clock frequency is controlled by the scheduler (via
+ * schedutil governor) we implement GRUB-PA: the spare reclaimed
+ * bandwidth is used to clock down frequency.
+ *
+ * However, what below seems to assume scheduler to always be in
+ * control of clock frequency; when running at a fixed frequency
+ * (e.g., performance or userspace governor), shouldn't we instead
+ * use the grub_reclaim mechanism below?
+ *
+ * if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
+ * delta_exec = grub_reclaim(delta_exec, rq, curr->dl.dl_bw);
+ * dl_se->runtime -= delta_exec;
+ */
+ scale_freq = arch_scale_freq_capacity(NULL, cpu);
+ scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
+ scaled_delta_exec = cap_scale(delta_exec, scale_freq);
+ scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
+ dl_se->runtime -= scaled_delta_exec;

throttle:
if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2805bd7c8994..37f12d0a3bc4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2818,8 +2818,6 @@ static u32 __compute_runnable_contrib(u64 n)
return contrib + runnable_avg_yN_sum[n];
}

-#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
-
/*
* We can represent the historical contribution to runnable average as the
* coefficients of a geometric series. To do this we sub-divide our runnable
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7b5e81120813..81bd048ed181 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -155,6 +155,8 @@ static inline int task_has_dl_policy(struct task_struct *p)
return dl_policy(p->policy);
}

+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+
static inline int dl_entity_is_special(struct sched_dl_entity *dl_se)
{
return dl_se->flags & SCHED_FLAG_SPECIAL;
--
2.10.0

2017-03-27 16:50:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
> Worker kthread needs to be able to change frequency for all other
> threads.
>
> Make it special, just under STOP class.

*yuck* ;-)

So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
'soecial' task will need to boost it. Now add BWI to your thinking and
shudder.


On IRC broonie mentioned that:

- most PMIC operations are fire and forget (no need to wait for a
response).
- PMIC 'packets' are 'small'.
- SPI has the possibility to push stuff on the queue.

Taken together this seems to suggest we can rework cpufreq drivers to
function in-context, either directly push the packet on the bus if
available, or queue it and let whoever owns it sort it without blocking.

It might be possible to rework/augment I2C to also support pushing stuff
on a queue.


So if we can make all that work, we can do away with this horrible
horrible kthread. Which is, IMO, a much better solution.

Thoughts?

2017-03-27 17:13:30

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On 27/03/17 19:05, Rafael J. Wysocki wrote:
> On Monday, March 27, 2017 06:01:34 PM Juri Lelli wrote:
> > On 27/03/17 18:50, Peter Zijlstra wrote:
> > > On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
> > > > Worker kthread needs to be able to change frequency for all other
> > > > threads.
> > > >
> > > > Make it special, just under STOP class.
> > >
> > > *yuck* ;-)
> > >
> >
> > Eh, I know. :/
> >
> > > So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
> > > 'soecial' task will need to boost it. Now add BWI to your thinking and
> > > shudder.
> > >
> >
> > Currently that kthread is FIFO already, so boosting still applies. Not as
> > bad as in the BWI case though. More thinking required.
> >
> > >
> > > On IRC broonie mentioned that:
> > >
> > > - most PMIC operations are fire and forget (no need to wait for a
> > > response).
> > > - PMIC 'packets' are 'small'.
> > > - SPI has the possibility to push stuff on the queue.
> > >
> > > Taken together this seems to suggest we can rework cpufreq drivers to
> > > function in-context, either directly push the packet on the bus if
> > > available, or queue it and let whoever owns it sort it without blocking.
> > >
> > > It might be possible to rework/augment I2C to also support pushing stuff
> > > on a queue.
> > >
> > >
> > > So if we can make all that work, we can do away with this horrible
> > > horrible kthread. Which is, IMO, a much better solution.
> > >
> > > Thoughts?
> >
> > Right. This is more a schedutil (cpufreq) problem though, IMHO. Even if
> > I agree that what you are proposing is way more clean (and here I
> > actually assume it's feasible at all), I fear it will take quite some
> > time to get reworked.
>
> Why do you think so?
>

It simply seemed a major rework to me. :)

2017-03-27 17:12:40

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On 27/03/17 18:50, Peter Zijlstra wrote:
> On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
> > Worker kthread needs to be able to change frequency for all other
> > threads.
> >
> > Make it special, just under STOP class.
>
> *yuck* ;-)
>

Eh, I know. :/

> So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
> 'soecial' task will need to boost it. Now add BWI to your thinking and
> shudder.
>

Currently that kthread is FIFO already, so boosting still applies. Not as
bad as in the BWI case though. More thinking required.

>
> On IRC broonie mentioned that:
>
> - most PMIC operations are fire and forget (no need to wait for a
> response).
> - PMIC 'packets' are 'small'.
> - SPI has the possibility to push stuff on the queue.
>
> Taken together this seems to suggest we can rework cpufreq drivers to
> function in-context, either directly push the packet on the bus if
> available, or queue it and let whoever owns it sort it without blocking.
>
> It might be possible to rework/augment I2C to also support pushing stuff
> on a queue.
>
>
> So if we can make all that work, we can do away with this horrible
> horrible kthread. Which is, IMO, a much better solution.
>
> Thoughts?

Right. This is more a schedutil (cpufreq) problem though, IMHO. Even if
I agree that what you are proposing is way more clean (and here I
actually assume it's feasible at all), I fear it will take quite some
time to get reworked.

Do we want to wait until that moment to get DEADLINE contribution
accounted for? :(

Thanks,

- Juri

2017-03-27 17:18:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On Monday, March 27, 2017 06:01:34 PM Juri Lelli wrote:
> On 27/03/17 18:50, Peter Zijlstra wrote:
> > On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
> > > Worker kthread needs to be able to change frequency for all other
> > > threads.
> > >
> > > Make it special, just under STOP class.
> >
> > *yuck* ;-)
> >
>
> Eh, I know. :/
>
> > So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
> > 'soecial' task will need to boost it. Now add BWI to your thinking and
> > shudder.
> >
>
> Currently that kthread is FIFO already, so boosting still applies. Not as
> bad as in the BWI case though. More thinking required.
>
> >
> > On IRC broonie mentioned that:
> >
> > - most PMIC operations are fire and forget (no need to wait for a
> > response).
> > - PMIC 'packets' are 'small'.
> > - SPI has the possibility to push stuff on the queue.
> >
> > Taken together this seems to suggest we can rework cpufreq drivers to
> > function in-context, either directly push the packet on the bus if
> > available, or queue it and let whoever owns it sort it without blocking.
> >
> > It might be possible to rework/augment I2C to also support pushing stuff
> > on a queue.
> >
> >
> > So if we can make all that work, we can do away with this horrible
> > horrible kthread. Which is, IMO, a much better solution.
> >
> > Thoughts?
>
> Right. This is more a schedutil (cpufreq) problem though, IMHO. Even if
> I agree that what you are proposing is way more clean (and here I
> actually assume it's feasible at all), I fear it will take quite some
> time to get reworked.

Why do you think so?

Thanks,
Rafael

2017-03-27 17:43:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On Monday, March 27, 2017 06:13:36 PM Juri Lelli wrote:
> On 27/03/17 19:05, Rafael J. Wysocki wrote:
> > On Monday, March 27, 2017 06:01:34 PM Juri Lelli wrote:
> > > On 27/03/17 18:50, Peter Zijlstra wrote:
> > > > On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
> > > > > Worker kthread needs to be able to change frequency for all other
> > > > > threads.
> > > > >
> > > > > Make it special, just under STOP class.
> > > >
> > > > *yuck* ;-)
> > > >
> > >
> > > Eh, I know. :/
> > >
> > > > So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
> > > > 'soecial' task will need to boost it. Now add BWI to your thinking and
> > > > shudder.
> > > >
> > >
> > > Currently that kthread is FIFO already, so boosting still applies. Not as
> > > bad as in the BWI case though. More thinking required.
> > >
> > > >
> > > > On IRC broonie mentioned that:
> > > >
> > > > - most PMIC operations are fire and forget (no need to wait for a
> > > > response).
> > > > - PMIC 'packets' are 'small'.
> > > > - SPI has the possibility to push stuff on the queue.
> > > >
> > > > Taken together this seems to suggest we can rework cpufreq drivers to
> > > > function in-context, either directly push the packet on the bus if
> > > > available, or queue it and let whoever owns it sort it without blocking.
> > > >
> > > > It might be possible to rework/augment I2C to also support pushing stuff
> > > > on a queue.
> > > >
> > > >
> > > > So if we can make all that work, we can do away with this horrible
> > > > horrible kthread. Which is, IMO, a much better solution.
> > > >
> > > > Thoughts?
> > >
> > > Right. This is more a schedutil (cpufreq) problem though, IMHO. Even if
> > > I agree that what you are proposing is way more clean (and here I
> > > actually assume it's feasible at all), I fear it will take quite some
> > > time to get reworked.
> >
> > Why do you think so?
> >
>
> It simply seemed a major rework to me. :)

OK

So there are two pieces here.

One is that if we want *all* drivers to work with schedutil, we need to keep
the kthread for the ones that will never be reworked (because nobody cares
etc). But then perhaps the kthread implementation may be left alone (because
nobody cares etc).

The second one is that there are drivers operating in-context that work with
schedutil already, so I don't see major obstacles to making more drivers work
that way. That would be only a matter of reworking the drivers in question.

Thanks,
Rafael

2017-03-27 18:06:14

by Mark Brown

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On Mon, Mar 27, 2017 at 06:50:11PM +0200, Peter Zijlstra wrote:

> Taken together this seems to suggest we can rework cpufreq drivers to
> function in-context, either directly push the packet on the bus if
> available, or queue it and let whoever owns it sort it without blocking.

Note that this isn't really cpufreq's problem per se, cpufreq is going
to work out what voltage it wants then tell the regulator API to do that
which will in turn go communicate with the PMIC somehow, typically with
regmap over I2C or SPI. All those layers use mutexes by default.

> So if we can make all that work, we can do away with this horrible
> horrible kthread. Which is, IMO, a much better solution.

> Thoughts?

I think it's doable, but a lot of work especially in the regulator code.


Attachments:
(No filename) (780.00 B)
signature.asc (488.00 B)
Download all attachments

2017-03-27 18:10:26

by Mark Brown

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On Mon, Mar 27, 2017 at 07:37:39PM +0200, Rafael J. Wysocki wrote:

> One is that if we want *all* drivers to work with schedutil, we need to keep
> the kthread for the ones that will never be reworked (because nobody cares
> etc). But then perhaps the kthread implementation may be left alone (because
> nobody cares etc).

A very large proportion of the affected code (at least once you get into
the realm of talking to the PMICs) is shared in core code in various
subsystems rather than in individual drivers so the problem might be
tractable.


Attachments:
(No filename) (548.00 B)
signature.asc (488.00 B)
Download all attachments

2017-03-28 09:30:05

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On 27 March 2017 at 18:50, Peter Zijlstra <[email protected]> wrote:
> On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
>> Worker kthread needs to be able to change frequency for all other
>> threads.
>>
>> Make it special, just under STOP class.
>
> *yuck* ;-)
>
> So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
> 'soecial' task will need to boost it. Now add BWI to your thinking and
> shudder.
>
>
> On IRC broonie mentioned that:
>
> - most PMIC operations are fire and forget (no need to wait for a
> response).
> - PMIC 'packets' are 'small'.
> - SPI has the possibility to push stuff on the queue.
>
> Taken together this seems to suggest we can rework cpufreq drivers to
> function in-context, either directly push the packet on the bus if
> available, or queue it and let whoever owns it sort it without blocking.
>
> It might be possible to rework/augment I2C to also support pushing stuff
> on a queue.

But sending new voltage value to PMIC is only part of the sequence.
When cpufreq set a new opp, it does

-set new voltage
-wait for the voltage to settle down.
-set the new clock frequency

you can even have to switch to an intermediate clock source.

When such sequence is managed by the kernel, we can't easily git ride
of a kthread

>
>
> So if we can make all that work, we can do away with this horrible
> horrible kthread. Which is, IMO, a much better solution.
>
> Thoughts?

2017-03-28 10:21:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On Tue, Mar 28, 2017 at 11:29:31AM +0200, Vincent Guittot wrote:
> On 27 March 2017 at 18:50, Peter Zijlstra <[email protected]> wrote:
> > On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
> >> Worker kthread needs to be able to change frequency for all other
> >> threads.
> >>
> >> Make it special, just under STOP class.
> >
> > *yuck* ;-)
> >
> > So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
> > 'soecial' task will need to boost it. Now add BWI to your thinking and
> > shudder.
> >
> >
> > On IRC broonie mentioned that:
> >
> > - most PMIC operations are fire and forget (no need to wait for a
> > response).
> > - PMIC 'packets' are 'small'.
> > - SPI has the possibility to push stuff on the queue.
> >
> > Taken together this seems to suggest we can rework cpufreq drivers to
> > function in-context, either directly push the packet on the bus if
> > available, or queue it and let whoever owns it sort it without blocking.
> >
> > It might be possible to rework/augment I2C to also support pushing stuff
> > on a queue.
>
> But sending new voltage value to PMIC is only part of the sequence.
> When cpufreq set a new opp, it does
>
> -set new voltage
> -wait for the voltage to settle down.
> -set the new clock frequency
>
> you can even have to switch to an intermediate clock source.
>
> When such sequence is managed by the kernel, we can't easily git ride
> of a kthread

That stinks :-(

The whole blocking and bandwidth inheritance stuff gives me nightmares.
Makes all this stuff almost impossible to analyse.

Esp. if the bus (SPI/I2C) is shared with some other 'expensive' device
like a DSP or MMC flash crud.

2017-03-29 22:47:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> No assumption can be made upon the rate at which frequency updates get
> triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> don't trigger them so frequently.
>
> Remove such assumption from the code.

But the util/max values for idle CPUs may be stale, no?

Thanks,
Rafael

2017-03-30 08:58:16

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

Hi,

On 30/03/17 00:41, Rafael J. Wysocki wrote:
> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> > No assumption can be made upon the rate at which frequency updates get
> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> > don't trigger them so frequently.
> >
> > Remove such assumption from the code.
>
> But the util/max values for idle CPUs may be stale, no?
>

Right, that might be a problem. A proper solution I think would be to
remotely update such values for idle CPUs, and I believe Vincent is
working on a patch for that.

As mid-term workarounds, changing a bit the current one, come to my
mind:

- consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
- remove CFS contribution (without triggering a freq update) when a CPU
enters IDLE; this might not work well, though, as we probably want
to keep in blocked util contribution for a bit

What you think is the way to go?

Thanks,

- Juri

2017-03-30 13:22:21

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

On 30 March 2017 at 10:58, Juri Lelli <[email protected]> wrote:
> Hi,
>
> On 30/03/17 00:41, Rafael J. Wysocki wrote:
>> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
>> > No assumption can be made upon the rate at which frequency updates get
>> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
>> > don't trigger them so frequently.
>> >
>> > Remove such assumption from the code.
>>
>> But the util/max values for idle CPUs may be stale, no?
>>
>
> Right, that might be a problem. A proper solution I think would be to
> remotely update such values for idle CPUs, and I believe Vincent is
> working on a patch for that.

Yes. I'm working on a patch that will regularly update the blocked
load/utilization of idle CPU. This update will be done on a slow pace
to make sure that utilization and load will be decayed regularly

>
> As mid-term workarounds, changing a bit the current one, come to my
> mind:
>
> - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
> - remove CFS contribution (without triggering a freq update) when a CPU
> enters IDLE; this might not work well, though, as we probably want
> to keep in blocked util contribution for a bit
>
> What you think is the way to go?
>
> Thanks,
>
> - Juri

2017-03-30 15:50:15

by Vikram Mulukutla

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE



> OK
>
> So there are two pieces here.
>
> One is that if we want *all* drivers to work with schedutil, we need to
> keep
> the kthread for the ones that will never be reworked (because nobody
> cares
> etc). But then perhaps the kthread implementation may be left alone
> (because
> nobody cares etc).
>
> The second one is that there are drivers operating in-context that work
> with
> schedutil already, so I don't see major obstacles to making more
> drivers work
> that way. That would be only a matter of reworking the drivers in
> question.
>
> Thanks,
> Rafael

There are some MSM platforms that do need a kthread and would love to
use
schedutil. This is all mainly due to the point that Vincent raised;
having
to actually wait for voltage transitions before clock switches. I can't
speak about the future, but that's the situation right now. Leaving the
kthread alone for now would be appreciated!

Thanks,
Vikram

2017-03-30 20:13:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <[email protected]> wrote:
> Hi,

Hi,

> On 30/03/17 00:41, Rafael J. Wysocki wrote:
>> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
>> > No assumption can be made upon the rate at which frequency updates get
>> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
>> > don't trigger them so frequently.
>> >
>> > Remove such assumption from the code.
>>
>> But the util/max values for idle CPUs may be stale, no?
>>
>
> Right, that might be a problem. A proper solution I think would be to
> remotely update such values for idle CPUs, and I believe Vincent is
> working on a patch for that.
>
> As mid-term workarounds, changing a bit the current one, come to my
> mind:
>
> - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
> - remove CFS contribution (without triggering a freq update) when a CPU
> enters IDLE; this might not work well, though, as we probably want
> to keep in blocked util contribution for a bit
>
> What you think is the way to go?

Well, do we want SCHED_DEADLINE util contribution to be there even for
idle CPUs?

Thanks,
Rafael

2017-03-30 20:28:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On Thursday, March 30, 2017 08:50:11 AM Vikram Mulukutla wrote:
>
> > OK
> >
> > So there are two pieces here.
> >
> > One is that if we want *all* drivers to work with schedutil, we need to
> > keep
> > the kthread for the ones that will never be reworked (because nobody
> > cares
> > etc). But then perhaps the kthread implementation may be left alone
> > (because
> > nobody cares etc).
> >
> > The second one is that there are drivers operating in-context that work
> > with
> > schedutil already, so I don't see major obstacles to making more
> > drivers work
> > that way. That would be only a matter of reworking the drivers in
> > question.
> >
> > Thanks,
> > Rafael
>
> There are some MSM platforms that do need a kthread and would love to
> use
> schedutil. This is all mainly due to the point that Vincent raised;
> having
> to actually wait for voltage transitions before clock switches. I can't
> speak about the future, but that's the situation right now. Leaving the
> kthread alone for now would be appreciated!

I was not arguing for removing the kthread (quite opposite rather).

My point was that *if* it is viable to rework drivers to operate in-context,
that would be the way to go IMO instead of messing up with the kthread thing.

Thanks,
Rafael

2017-03-31 07:27:01

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFD PATCH 3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

On 30/03/17 22:22, Rafael J. Wysocki wrote:
> On Thursday, March 30, 2017 08:50:11 AM Vikram Mulukutla wrote:
> >
> > > OK
> > >
> > > So there are two pieces here.
> > >
> > > One is that if we want *all* drivers to work with schedutil, we need to
> > > keep
> > > the kthread for the ones that will never be reworked (because nobody
> > > cares
> > > etc). But then perhaps the kthread implementation may be left alone
> > > (because
> > > nobody cares etc).
> > >
> > > The second one is that there are drivers operating in-context that work
> > > with
> > > schedutil already, so I don't see major obstacles to making more
> > > drivers work
> > > that way. That would be only a matter of reworking the drivers in
> > > question.
> > >
> > > Thanks,
> > > Rafael
> >
> > There are some MSM platforms that do need a kthread and would love to
> > use
> > schedutil. This is all mainly due to the point that Vincent raised;
> > having
> > to actually wait for voltage transitions before clock switches. I can't
> > speak about the future, but that's the situation right now. Leaving the
> > kthread alone for now would be appreciated!
>
> I was not arguing for removing the kthread (quite opposite rather).
>
> My point was that *if* it is viable to rework drivers to operate in-context,
> that would be the way to go IMO instead of messing up with the kthread thing.
>

Right, I agree. Problem is that in principle we might still want to use
DEADLINE with the other platforms (MSM being a perfect example), so IMHO
we should still try to find a solution for the kthread anyway.

2017-03-31 07:31:34

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

On 30/03/17 22:13, Rafael J. Wysocki wrote:
> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <[email protected]> wrote:
> > Hi,
>
> Hi,
>
> > On 30/03/17 00:41, Rafael J. Wysocki wrote:
> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> >> > No assumption can be made upon the rate at which frequency updates get
> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> >> > don't trigger them so frequently.
> >> >
> >> > Remove such assumption from the code.
> >>
> >> But the util/max values for idle CPUs may be stale, no?
> >>
> >
> > Right, that might be a problem. A proper solution I think would be to
> > remotely update such values for idle CPUs, and I believe Vincent is
> > working on a patch for that.
> >
> > As mid-term workarounds, changing a bit the current one, come to my
> > mind:
> >
> > - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
> > - remove CFS contribution (without triggering a freq update) when a CPU
> > enters IDLE; this might not work well, though, as we probably want
> > to keep in blocked util contribution for a bit
> >
> > What you think is the way to go?
>
> Well, do we want SCHED_DEADLINE util contribution to be there even for
> idle CPUs?
>

DEADLINE util contribution is removed, even if the CPU is idle, by the
reclaiming mechanism when we know (applying GRUB algorithm rules [1])
that it can't be used anymore by a task (roughly speaking). So, we
shouldn't have this problem in the DEADLINE case.

[1] https://marc.info/?l=linux-kernel&m=149029880524038

2017-03-31 09:03:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

On Fri, Mar 31, 2017 at 9:31 AM, Juri Lelli <[email protected]> wrote:
> On 30/03/17 22:13, Rafael J. Wysocki wrote:
>> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <[email protected]> wrote:
>> > Hi,
>>
>> Hi,
>>
>> > On 30/03/17 00:41, Rafael J. Wysocki wrote:
>> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
>> >> > No assumption can be made upon the rate at which frequency updates get
>> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
>> >> > don't trigger them so frequently.
>> >> >
>> >> > Remove such assumption from the code.
>> >>
>> >> But the util/max values for idle CPUs may be stale, no?
>> >>
>> >
>> > Right, that might be a problem. A proper solution I think would be to
>> > remotely update such values for idle CPUs, and I believe Vincent is
>> > working on a patch for that.
>> >
>> > As mid-term workarounds, changing a bit the current one, come to my
>> > mind:
>> >
>> > - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
>> > - remove CFS contribution (without triggering a freq update) when a CPU
>> > enters IDLE; this might not work well, though, as we probably want
>> > to keep in blocked util contribution for a bit
>> >
>> > What you think is the way to go?
>>
>> Well, do we want SCHED_DEADLINE util contribution to be there even for
>> idle CPUs?
>>
>
> DEADLINE util contribution is removed, even if the CPU is idle, by the
> reclaiming mechanism when we know (applying GRUB algorithm rules [1])
> that it can't be used anymore by a task (roughly speaking). So, we
> shouldn't have this problem in the DEADLINE case.
>
> [1] https://marc.info/?l=linux-kernel&m=149029880524038

OK

Why don't you store the contributions from DL and CFS separately, then
(say, as util_dl, util_cfs, respectively) and only discard the CFS one
if delta_ns > TICK_NSEC?

2017-03-31 09:16:40

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFD PATCH 4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

On 31/03/17 11:03, Rafael J. Wysocki wrote:
> On Fri, Mar 31, 2017 at 9:31 AM, Juri Lelli <[email protected]> wrote:
> > On 30/03/17 22:13, Rafael J. Wysocki wrote:
> >> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <[email protected]> wrote:
> >> > Hi,
> >>
> >> Hi,
> >>
> >> > On 30/03/17 00:41, Rafael J. Wysocki wrote:
> >> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> >> >> > No assumption can be made upon the rate at which frequency updates get
> >> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> >> >> > don't trigger them so frequently.
> >> >> >
> >> >> > Remove such assumption from the code.
> >> >>
> >> >> But the util/max values for idle CPUs may be stale, no?
> >> >>
> >> >
> >> > Right, that might be a problem. A proper solution I think would be to
> >> > remotely update such values for idle CPUs, and I believe Vincent is
> >> > working on a patch for that.
> >> >
> >> > As mid-term workarounds, changing a bit the current one, come to my
> >> > mind:
> >> >
> >> > - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
> >> > - remove CFS contribution (without triggering a freq update) when a CPU
> >> > enters IDLE; this might not work well, though, as we probably want
> >> > to keep in blocked util contribution for a bit
> >> >
> >> > What you think is the way to go?
> >>
> >> Well, do we want SCHED_DEADLINE util contribution to be there even for
> >> idle CPUs?
> >>
> >
> > DEADLINE util contribution is removed, even if the CPU is idle, by the
> > reclaiming mechanism when we know (applying GRUB algorithm rules [1])
> > that it can't be used anymore by a task (roughly speaking). So, we
> > shouldn't have this problem in the DEADLINE case.
> >
> > [1] https://marc.info/?l=linux-kernel&m=149029880524038
>
> OK
>
> Why don't you store the contributions from DL and CFS separately, then
> (say, as util_dl, util_cfs, respectively) and only discard the CFS one
> if delta_ns > TICK_NSEC?

Sure, this should work as well. I'll try this approach for next version.

Thanks,

- Juri