2021-12-23 12:30:30

by Cruz Zhao

[permalink] [raw]
Subject: [PATCH 2/2] sched/core: Uncookied force idle accounting per cpu

Forced idle can be divided into two types, forced idle with cookie'd task
running on it SMT sibling, and forced idle with uncookie'd task running
on it SMT sibling, which should be accounting to measure the cost of
enabling core scheduling too.

This patch accounts the forced idle time with uncookie'd task, and the
sum of both.

A few details:
- Uncookied forceidle time and total forceidle time is displayed via
the last two columns of /proc/stat.
- Uncookied forceidle time is ony accounted when this cpu is forced
idle and a sibling hyperthread is running with an uncookie'd task.

Signed-off-by: Cruz Zhao <[email protected]>
---
fs/proc/stat.c | 13 ++++++++++++-
include/linux/kernel_stat.h | 1 +
kernel/sched/core.c | 3 +--
kernel/sched/core_sched.c | 7 ++++---
4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 3a2fbc9..21607cf 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -110,7 +110,7 @@ static int show_stat(struct seq_file *p, void *v)
int i, j;
u64 user, nice, system, idle, iowait, irq, softirq, steal;
#ifdef CONFIG_SCHED_CORE
- u64 cookied_forceidle = 0;
+ u64 cookied_forceidle, uncookied_forceidle, forceidle;
#endif
u64 guest, guest_nice;
u64 sum = 0;
@@ -121,6 +121,9 @@ static int show_stat(struct seq_file *p, void *v)
user = nice = system = idle = iowait =
irq = softirq = steal = 0;
guest = guest_nice = 0;
+#ifdef CONFIG_SCHED_CORE
+ cookied_forceidle = uncookied_forceidle = forceidle = 0;
+#endif
getboottime64(&boottime);
/* shift boot timestamp according to the timens offset */
timens_sub_boottime(&boottime);
@@ -145,6 +148,8 @@ static int show_stat(struct seq_file *p, void *v)
sum += arch_irq_stat_cpu(i);
#ifdef CONFIG_SCHED_CORE
cookied_forceidle += cpustat[CPUTIME_COOKIED_FORCEIDLE];
+ uncookied_forceidle += cpustat[CPUTIME_UNCOOKIED_FORCEIDLE];
+ forceidle = cookied_forceidle + uncookied_forceidle;
#endif

for (j = 0; j < NR_SOFTIRQS; j++) {
@@ -168,6 +173,8 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
#ifdef CONFIG_SCHED_CORE
seq_put_decimal_ull(p, " ", nsec_to_clock_t(cookied_forceidle));
+ seq_put_decimal_ull(p, " ", nsec_to_clock_t(uncookied_forceidle));
+ seq_put_decimal_ull(p, " ", nsec_to_clock_t(forceidle));
#endif
seq_putc(p, '\n');

@@ -190,6 +197,8 @@ static int show_stat(struct seq_file *p, void *v)
guest_nice = cpustat[CPUTIME_GUEST_NICE];
#ifdef CONFIG_SCHED_CORE
cookied_forceidle = cpustat[CPUTIME_COOKIED_FORCEIDLE];
+ uncookied_forceidle = cpustat[CPUTIME_UNCOOKIED_FORCEIDLE];
+ forceidle = cookied_forceidle + uncookied_forceidle;
#endif
seq_printf(p, "cpu%d", i);
seq_put_decimal_ull(p, " ", nsec_to_clock_t(user));
@@ -204,6 +213,8 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
#ifdef CONFIG_SCHED_CORE
seq_put_decimal_ull(p, " ", nsec_to_clock_t(cookied_forceidle));
+ seq_put_decimal_ull(p, " ", nsec_to_clock_t(uncookied_forceidle));
+ seq_put_decimal_ull(p, " ", nsec_to_clock_t(forceidle));
#endif
seq_putc(p, '\n');
}
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index a21b065..23945c1 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -30,6 +30,7 @@ enum cpu_usage_stat {
CPUTIME_GUEST_NICE,
#ifdef CONFIG_SCHED_CORE
CPUTIME_COOKIED_FORCEIDLE,
+ CPUTIME_UNCOOKIED_FORCEIDLE,
#endif
NR_STATS,
};
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f4f4b24..16d937e4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5822,8 +5822,7 @@ static inline struct task_struct *pick_task(struct rq *rq)
}

if (rq->core->core_forceidle_count) {
- if (cookie)
- rq->core->core_forceidle_start = rq_clock(rq->core);
+ rq->core->core_forceidle_start = rq_clock(rq->core);
rq->core->core_forceidle_occupation = occ;
}

diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index bc5f45f..89bd49d 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -265,11 +265,12 @@ void sched_core_account_forceidle(struct rq *rq)
rq_i = cpu_rq(i);
p = rq_i->core_pick ?: rq_i->curr;

- if (!rq->core->core_cookie)
- continue;
if (p == rq_i->idle && rq_i->nr_running) {
cpustat = kcpustat_cpu(i).cpustat;
- cpustat[CPUTIME_COOKIED_FORCEIDLE] += delta;
+ if (rq->core->core_cookie)
+ cpustat[CPUTIME_COOKIED_FORCEIDLE] += delta;
+ else
+ cpustat[CPUTIME_UNCOOKIED_FORCEIDLE] += delta;
}
}

--
1.8.3.1



2022-01-05 01:56:34

by Josh Don

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/core: Uncookied force idle accounting per cpu

Hi Cruz,

On Thu, Dec 23, 2021 at 4:30 AM Cruz Zhao <[email protected]> wrote:
>
> Forced idle can be divided into two types, forced idle with cookie'd task
> running on it SMT sibling, and forced idle with uncookie'd task running
> on it SMT sibling, which should be accounting to measure the cost of
> enabling core scheduling too.
>
> This patch accounts the forced idle time with uncookie'd task, and the
> sum of both.
>
> A few details:
> - Uncookied forceidle time and total forceidle time is displayed via
> the last two columns of /proc/stat.
> - Uncookied forceidle time is ony accounted when this cpu is forced
> idle and a sibling hyperthread is running with an uncookie'd task.

What is the purpose/use-case to account the forced idle from
uncookie'd tasks? The forced-idle from cookie'd tasks represents
capacity loss due to adding in some cookie'd tasks. If forced idle is
high, that can be rectified by making some changes to the cookie'd
tasks (ie. their affinity, cpu budget, etc.).

2022-01-05 11:33:57

by Cruz Zhao

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/core: Uncookied force idle accounting per cpu



在 2022/1/5 上午9:56, Josh Don 写道:
> Hi Cruz,
>
> On Thu, Dec 23, 2021 at 4:30 AM Cruz Zhao <[email protected]> wrote:
>>
>> Forced idle can be divided into two types, forced idle with cookie'd task
>> running on it SMT sibling, and forced idle with uncookie'd task running
>> on it SMT sibling, which should be accounting to measure the cost of
>> enabling core scheduling too.
>>
>> This patch accounts the forced idle time with uncookie'd task, and the
>> sum of both.
>>
>> A few details:
>> - Uncookied forceidle time and total forceidle time is displayed via
>> the last two columns of /proc/stat.
>> - Uncookied forceidle time is ony accounted when this cpu is forced
>> idle and a sibling hyperthread is running with an uncookie'd task.
>
When we care about capacity loss, we care about all but not some of it.
The forced idle time from uncookie'd task is actually caused by the
cookie'd task in runqueue indirectly, and it's more accurate to measure
the capacity loss with the sum of cookie'd forced idle time and
uncookie'd forced idle time, as far as I'm concerned.

Assuming cpu x and cpu y are a pair of smt siblings, consider the
following scenarios:
1. There's a cookie'd task A running on cpu x, and there're 4 uncookie'd
tasks B~E running on cpu y. For cpu x, there will be 80% forced idle
time(from uncookie'd task); for cpu y, there will be 20% forced idle
time(from cookie'd task).
2. There's a uncookie'd task A running on cpu x, and there're 4 cookie'd
tasks B~E running on cpu y. For cpu x, there will be 80% forced idle
time(from cookie'd task); for cpu y, there will be 20% forced idle
time(from uncookie'd task).
The scenario1 can recurrent by stress-ng(scenario2 can recurrent similary):
(cookie'd)taskset -c x stress-ng -c 1 -l 100
(uncookie'd)taskset -c y stress-ng -c 4 -l 100

In the above two scenarios, the capacity loss is 1 cpu, but in
scenario1, the cookie'd forced idle time tells us 20%cpu loss, in
scenario2, the cookie'd forced idle time tells us 80% forced idle time,
which are not accurate. It'll be more accurate with the sum of cookie'd
forced idle time and uncookie'd forced idle time.

Best,
Cruz Zhao

> What is the purpose/use-case to account the forced idle from
> uncookie'd tasks? The forced-idle from cookie'd tasks represents
> capacity loss due to adding in some cookie'd tasks. If forced idle is
> high, that can be rectified by making some changes to the cookie'd
> tasks (ie. their affinity, cpu budget, etc.).

2022-01-05 20:59:44

by Josh Don

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/core: Uncookied force idle accounting per cpu

On Wed, Jan 5, 2022 at 3:33 AM cruzzhao <[email protected]> wrote:
>
> When we care about capacity loss, we care about all but not some of it.
> The forced idle time from uncookie'd task is actually caused by the
> cookie'd task in runqueue indirectly, and it's more accurate to measure
> the capacity loss with the sum of cookie'd forced idle time and
> uncookie'd forced idle time, as far as I'm concerned.
>
> Assuming cpu x and cpu y are a pair of smt siblings, consider the
> following scenarios:
> 1. There's a cookie'd task A running on cpu x, and there're 4 uncookie'd
> tasks B~E running on cpu y. For cpu x, there will be 80% forced idle
> time(from uncookie'd task); for cpu y, there will be 20% forced idle
> time(from cookie'd task).
> 2. There's a uncookie'd task A running on cpu x, and there're 4 cookie'd
> tasks B~E running on cpu y. For cpu x, there will be 80% forced idle
> time(from cookie'd task); for cpu y, there will be 20% forced idle
> time(from uncookie'd task).
> The scenario1 can recurrent by stress-ng(scenario2 can recurrent similary):
> (cookie'd)taskset -c x stress-ng -c 1 -l 100
> (uncookie'd)taskset -c y stress-ng -c 4 -l 100
>
> In the above two scenarios, the capacity loss is 1 cpu, but in
> scenario1, the cookie'd forced idle time tells us 20%cpu loss, in
> scenario2, the cookie'd forced idle time tells us 80% forced idle time,
> which are not accurate. It'll be more accurate with the sum of cookie'd
> forced idle time and uncookie'd forced idle time.

Why do you need this separated out into two fields then? Could we just
combine the uncookie'd and cookie'd forced idle into a single sum?

IMO it is fine to account the forced idle from uncookie'd tasks, but
we should then also change the task accounting to do the same, for
consistency.

2022-01-06 12:05:10

by Cruz Zhao

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/core: Uncookied force idle accounting per cpu


在 2022/1/6 上午4:59, Josh Don 写道:

It's a good idea to combine them into a single sum. I separated them in
order to be consistent with the task accounting and for easy to understand.
As for change the task accounting, I've tried but I haven't found a
proper method to do so. I've considered the following methods:
1. Account the uncookie'd force idle time to the uncookie'd task, but
it'll be hard to trace the uncookie'd task.
2. Account the uncookie'd force idle time to the cookie'd task in the
core_tree of the core, but it will cost a lot on traversing the core_tree.

Many thanks for suggestions.
Best,
Cruz Zhao

> Why do you need this separated out into two fields then? Could we just
> combine the uncookie'd and cookie'd forced idle into a single sum?
>
> IMO it is fine to account the forced idle from uncookie'd tasks, but
> we should then also change the task accounting to do the same, for
> consistency.

2022-01-06 20:03:35

by Josh Don

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/core: Uncookied force idle accounting per cpu

On Thu, Jan 6, 2022 at 4:05 AM cruzzhao <[email protected]> wrote:
>
>
> 在 2022/1/6 上午4:59, Josh Don 写道:
>
> It's a good idea to combine them into a single sum. I separated them in
> order to be consistent with the task accounting and for easy to understand.
> As for change the task accounting, I've tried but I haven't found a
> proper method to do so. I've considered the following methods:
> 1. Account the uncookie'd force idle time to the uncookie'd task, but
> it'll be hard to trace the uncookie'd task.

Not sure what you mean there, I think you just need to add

--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -294,7 +294,7 @@ void sched_core_account_forceidle(struct rq *rq)
rq_i = cpu_rq(i);
p = rq_i->core_pick ?: rq_i->curr;

- if (!p->core_cookie)
+ if (p == rq_i->idle)
continue;

__schedstat_add(p->stats.core_forceidle_sum, delta)

> 2. Account the uncookie'd force idle time to the cookie'd task in the
> core_tree of the core, but it will cost a lot on traversing the core_tree.
>
> Many thanks for suggestions.
> Best,
> Cruz Zhao
>
> > Why do you need this separated out into two fields then? Could we just
> > combine the uncookie'd and cookie'd forced idle into a single sum?
> >
> > IMO it is fine to account the forced idle from uncookie'd tasks, but
> > we should then also change the task accounting to do the same, for
> > consistency.