From: Zhaoyang Huang <[email protected]>
As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
timing value want to know the distribution of how these spread
approximately by using utilization account value (nivcsw is not enough
sometimes). This commit would like to introduce a helper function to
achieve this goal.
eg.
Effective part of A = Total_time * cpu_util_cfs / cpu_util
Timing value A
(should be a process last for several TICKs or statistics of a repeadted
process)
Timing start
|
|
preempted by RT, DL or IRQ
|\
| This period time is nonvoluntary CPU give up, need to know how long
|/
sched in again
|
|
|
Timing end
Signed-off-by: Zhaoyang Huang <[email protected]>
---
change of v2: using two parameter to pass se_prop and rq_prop out
---
---
include/linux/sched.h | 3 +++
kernel/sched/core.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..d6d5914fad10 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2318,6 +2318,9 @@ static inline bool owner_on_cpu(struct task_struct *owner)
/* Returns effective CPU energy utilization, as seen by the scheduler */
unsigned long sched_cpu_util(int cpu);
+/* Returns task's and cfs_rq's proportion among whole core */
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long *se_prop,
+ unsigned long *rq_prop);
#endif /* CONFIG_SMP */
#ifdef CONFIG_RSEQ
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..b8c29dff5d37 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7494,6 +7494,41 @@ unsigned long sched_cpu_util(int cpu)
{
return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
}
+
+/*
+ * Calculate the approximate proportion of timing value consumed by the specified
+ * tsk and all cfs tasks of this core.
+ * The user must be aware of this is done by avg_util which is tracked by
+ * the geometric series of decaying the load by y^32 = 0.5 (unit is 1ms).
+ * That is, only the period last for at least several TICKs or the statistics
+ * of repeated timing value are suitable for this helper function.
+ * This function is actually derived from effective_cpu_util but without
+ * limiting the util to the core's capacity.
+ * se_prop and rq_prop is valid only when return value is 1
+ */
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long *se_prop,
+ unsigned long *rq_prop)
+{
+ unsigned int cpu = task_cpu(tsk);
+ struct sched_entity *se = &tsk->se;
+ struct rq *rq = cpu_rq(cpu);
+ unsigned long util, irq, max;
+
+ if (tsk->sched_class != &fair_sched_class)
+ return 0;
+
+ max = arch_scale_cpu_capacity(cpu);
+ irq = cpu_util_irq(rq);
+
+ util = cpu_util_rt(rq) + cpu_util_cfs(cpu) + cpu_util_dl(rq);
+ util = scale_irq_capacity(util, irq, max);
+ util += irq;
+
+ *se_prop = se->avg.util_avg * 100 / util;
+ *rq_prop = cpu_util_cfs(cpu) * 100 / util;
+ return 1;
+}
+
#endif /* CONFIG_SMP */
/**
--
2.25.1
On Thu, Feb 22, 2024 at 05:22:19PM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <[email protected]>
>
> As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
> timing value want to know the distribution of how these spread
> approximately by using utilization account value (nivcsw is not enough
> sometimes). This commit would like to introduce a helper function to
> achieve this goal.
Maybe I'm just thick but this still looks like alphabet soup to me.
Can you try to exlain why this matters, or maybe get help from the
scheduler folks to help with explaining the concepts?