2021-05-19 18:08:37

by Odin Ugedal

[permalink] [raw]
Subject: [PATCH 0/3] sched/fair: Fix load decay issues related to throttling

Here is a follow up with some fairness fixes related to throttling, PELT and
load decay in general.

It is related to the discussion in:

https://lore.kernel.org/lkml/[email protected] and
https://lkml.kernel.org/r/[email protected]

Tested on v5.13-rc2 (since that contain the fix from above^).

The patch descriptions should make sense in its own, and I have attached some
simple reproduction scripts at the end of this mail. I also appended a patch
fixing some ascii art that I have been looking at several times without
understanding, when it turns out it breaks if tabs is not 8 spaces. I can
submit that as a separate patch if necessary.

Also, I have no idea what to call the "insert_on_unthrottle" var, so feel
free to come with suggestions.


There are probably "better" and more reliable ways to reproduce this, but
these works for me "most of the time", and gives an ok context imo. Throttling
is not deterministic, so keep that in mind. I have been testing with
CONFIG_HZ=250, so if you use =1000 (or anything else), you might get other
results/harder to reproduce.

Reprod script for "Add tg_load_contrib cfs_rq decay checking":
--- bash start
CGROUP=/sys/fs/cgroup/slice

function run_sandbox {
local CG="$1"
local LCPU="$2"
local SHARES="$3"
local CMD="$4"

local PIPE="$(mktemp -u)"
mkfifo "$PIPE"
sh -c "read < $PIPE ; exec $CMD" &
local TASK="$!"
mkdir -p "$CG/sub"
tee "$CG"/cgroup.subtree_control <<< "+cpuset +cpu"
tee "$CG"/sub/cgroup.procs <<< "$TASK"
tee "$CG"/sub/cpuset.cpus <<< "$LCPU"
tee "$CG"/sub/cpu.weight <<< "$SHARES"
tee "$CG"/cpu.max <<< "10000 100000"

sleep .1
tee "$PIPE" <<< sandox_done
rm "$PIPE"
}

mkdir -p "$CGROUP"
tee "$CGROUP"/cgroup.subtree_control <<< "+cpuset +cpu"

run_sandbox "$CGROUP/cg-1" "0" 100 "stress --cpu 1"
run_sandbox "$CGROUP/cg-2" "3" 100 "stress --cpu 1"
sleep 1.02
tee "$CGROUP"/cg-1/sub/cpuset.cpus <<< "1"
sleep 1.05
tee "$CGROUP"/cg-1/sub/cpuset.cpus <<< "2"
sleep 1.07
tee "$CGROUP"/cg-1/sub/cpuset.cpus <<< "3"

sleep 2

tee "$CGROUP"/cg-1/cpu.max <<< "max"
tee "$CGROUP"/cg-2/cpu.max <<< "max"

read
killall stress
sleep .2
rmdir /sys/fs/cgroup/slice/{cg-{1,2}{/sub,},}

# Often gives:
# cat /sys/kernel/debug/sched/debug | grep ":/slice" -A 28 | egrep "(:/slice)|tg_load_avg" odin@4670k
#
# cfs_rq[3]:/slice/cg-2/sub
# .tg_load_avg_contrib : 1024
# .tg_load_avg : 1024
# cfs_rq[3]:/slice/cg-1/sub
# .tg_load_avg_contrib : 1023
# .tg_load_avg : 1023
# cfs_rq[3]:/slice/cg-1
# .tg_load_avg_contrib : 1040
# .tg_load_avg : 2062
# cfs_rq[3]:/slice/cg-2
# .tg_load_avg_contrib : 1013
# .tg_load_avg : 1013
# cfs_rq[3]:/slice
# .tg_load_avg_contrib : 1540
# .tg_load_avg : 1540
--- bash end


Reprod for "sched/fair: Correctly insert cfs_rqs to list on unthrottle":
--- bash start
CGROUP=/sys/fs/cgroup/slice
TMP_CG=/sys/fs/cgroup/tmp
OLD_CG=/sys/fs/cgroup"$(cat /proc/self/cgroup | cut -c4-)"
function run_sandbox {
local CG="$1"
local LCPU="$2"
local SHARES="$3"
local CMD="$4"

local PIPE="$(mktemp -u)"
mkfifo "$PIPE"
sh -c "read < $PIPE ; exec $CMD" &
local TASK="$!"
mkdir -p "$CG/sub"
tee "$CG"/cgroup.subtree_control <<< "+cpuset +cpu"
tee "$CG"/sub/cpuset.cpus <<< "$LCPU"
tee "$CG"/sub/cgroup.procs <<< "$TASK"
tee "$CG"/sub/cpu.weight <<< "$SHARES"

sleep .01
tee "$PIPE" <<< sandox_done
rm "$PIPE"
}

mkdir -p "$CGROUP"
mkdir -p "$TMP_CG"
tee "$CGROUP"/cgroup.subtree_control <<< "+cpuset +cpu"

echo $$ | tee "$TMP_CG"/cgroup.procs
tee "$TMP_CG"/cpuset.cpus <<< "0"
sleep .1

tee "$CGROUP"/cpu.max <<< "1000 4000"

run_sandbox "$CGROUP/cg-0" "0" 10000 "stress --cpu 1"
run_sandbox "$CGROUP/cg-3" "3" 1 "stress --cpu 1"

sleep 2
tee "$CGROUP"/cg-0/sub/cpuset.cpus <<< "3"

tee "$CGROUP"/cpu.max <<< "max"

read
killall stress
sleep .2
echo $$ | tee "$OLD_CG"/cgroup.procs
rmdir "$TMP_CG" /sys/fs/cgroup/slice/{cg-{0,3}{/sub,},}

# Often gives:
# cat /sys/kernel/debug/sched/debug | grep ":/slice" -A 28 | egrep "(:/slice)|tg_load_avg" odin@4670k
#
# cfs_rq[3]:/slice/cg-3/sub
# .tg_load_avg_contrib : 1039
# .tg_load_avg : 2036
# cfs_rq[3]:/slice/cg-0/sub
# .tg_load_avg_contrib : 1023
# .tg_load_avg : 1023
# cfs_rq[3]:/slice/cg-0
# .tg_load_avg_contrib : 102225
# .tg_load_avg : 102225
# cfs_rq[3]:/slice/cg-3
# .tg_load_avg_contrib : 4
# .tg_load_avg : 1001
# cfs_rq[3]:/slice
# .tg_load_avg_contrib : 1038
# .tg_load_avg : 1038
--- bash end

Thanks
Odin

Odin Ugedal (3):
sched/fair: Add tg_load_contrib cfs_rq decay checking
sched/fair: Correctly insert cfs_rq's to list on unthrottle
sched/fair: Fix ascii art by relpacing tabs

kernel/sched/fair.c | 22 +++++++++++++---------
kernel/sched/sched.h | 1 +
2 files changed, 14 insertions(+), 9 deletions(-)

--
2.31.1



2021-05-19 18:08:45

by Odin Ugedal

[permalink] [raw]
Subject: [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle

This fixes an issue where fairness is decreased since cfs_rq's can
end up not being decayed properly. For two sibling control groups with
the same priority, this can often lead to a load ratio of 99/1 (!!).

This happen because when a cfs_rq is throttled, all the descendant cfs_rq's
will be removed from the leaf list. When they initial cfs_rq is
unthrottled, it will currently only re add descendant cfs_rq's if they
have one or more entities enqueued. This is not a perfect heuristic.

This fix change this behavior to save what cfs_rq's was removed from the
list, and readds them properly on unthrottle.

Can often lead to sutiations like this for equally weighted control
groups:

$ ps u -C stress
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1
root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1

Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
Signed-off-by: Odin Ugedal <[email protected]>
---
kernel/sched/fair.c | 11 ++++++-----
kernel/sched/sched.h | 1 +
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ceda53c2a87a..e7423d658389 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -376,7 +376,8 @@ static inline bool list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq)
return false;
}

-static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
+/* Returns 1 if cfs_rq was present in the list and removed */
+static inline bool list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
{
if (cfs_rq->on_list) {
struct rq *rq = rq_of(cfs_rq);
@@ -393,7 +394,9 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)

list_del_rcu(&cfs_rq->leaf_cfs_rq_list);
cfs_rq->on_list = 0;
+ return 1;
}
+ return 0;
}

static inline void assert_list_leaf_cfs_rq(struct rq *rq)
@@ -4742,9 +4745,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
if (!cfs_rq->throttle_count) {
cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
cfs_rq->throttled_clock_task;
-
- /* Add cfs_rq with already running entity in the list */
- if (cfs_rq->nr_running >= 1)
+ if (cfs_rq->insert_on_unthrottle)
list_add_leaf_cfs_rq(cfs_rq);
}

@@ -4759,7 +4760,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)
/* group is entering throttled state, stop time */
if (!cfs_rq->throttle_count) {
cfs_rq->throttled_clock_task = rq_clock_task(rq);
- list_del_leaf_cfs_rq(cfs_rq);
+ cfs_rq->insert_on_unthrottle = list_del_leaf_cfs_rq(cfs_rq);
}
cfs_rq->throttle_count++;

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a189bec13729..12a707d99ee6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -602,6 +602,7 @@ struct cfs_rq {
u64 throttled_clock_task_time;
int throttled;
int throttle_count;
+ int insert_on_unthrottle;
struct list_head throttled_list;
#endif /* CONFIG_CFS_BANDWIDTH */
#endif /* CONFIG_FAIR_GROUP_SCHED */
--
2.31.1


2021-05-19 18:10:00

by Odin Ugedal

[permalink] [raw]
Subject: [PATCH 3/3] sched/fair: Fix ascii art by relpacing tabs

When using something other than 8 spaces per tab, this ascii art
makes not sense, and the reader might end up wondering what this
advanced equation "is".

Signed-off-by: Odin Ugedal <[email protected]>
---
kernel/sched/fair.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e7423d658389..c872e38ec32b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3142,7 +3142,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->load.weight
* ge->load.weight = ----------------------------- (1)
- * \Sum grq->load.weight
+ * \Sum grq->load.weight
*
* Now, because computing that sum is prohibitively expensive to compute (been
* there, done that) we approximate it with this average stuff. The average
@@ -3156,7 +3156,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->avg.load_avg
* ge->load.weight = ------------------------------ (3)
- * tg->load_avg
+ * tg->load_avg
*
* Where: tg->load_avg ~= \Sum grq->avg.load_avg
*
@@ -3172,7 +3172,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->load.weight
* ge->load.weight = ----------------------------- = tg->weight (4)
- * grp->load.weight
+ * grp->load.weight
*
* That is, the sum collapses because all other CPUs are idle; the UP scenario.
*
@@ -3191,7 +3191,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->load.weight
* ge->load.weight = ----------------------------- (6)
- * tg_load_avg'
+ * tg_load_avg'
*
* Where:
*
--
2.31.1


2021-05-27 13:28:47

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/fair: Fix ascii art by relpacing tabs

On Tue, 18 May 2021 at 14:55, Odin Ugedal <[email protected]> wrote:
>
> When using something other than 8 spaces per tab, this ascii art
> makes not sense, and the reader might end up wondering what this
> advanced equation "is".

Documentation/process/coding-style.rst says to use 8 characters for
tab so we should not really consider other tab value

That being said the numerators and other parts of the equation use
spaces whereas denominators use tabs. so using space everywhere looks
good for me

Acked-by: Vincent Guittot <[email protected]>

>
> Signed-off-by: Odin Ugedal <[email protected]>
> ---
> kernel/sched/fair.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e7423d658389..c872e38ec32b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3142,7 +3142,7 @@ void reweight_task(struct task_struct *p, int prio)
> *
> * tg->weight * grq->load.weight
> * ge->load.weight = ----------------------------- (1)
> - * \Sum grq->load.weight
> + * \Sum grq->load.weight
> *
> * Now, because computing that sum is prohibitively expensive to compute (been
> * there, done that) we approximate it with this average stuff. The average
> @@ -3156,7 +3156,7 @@ void reweight_task(struct task_struct *p, int prio)
> *
> * tg->weight * grq->avg.load_avg
> * ge->load.weight = ------------------------------ (3)
> - * tg->load_avg
> + * tg->load_avg
> *
> * Where: tg->load_avg ~= \Sum grq->avg.load_avg
> *
> @@ -3172,7 +3172,7 @@ void reweight_task(struct task_struct *p, int prio)
> *
> * tg->weight * grq->load.weight
> * ge->load.weight = ----------------------------- = tg->weight (4)
> - * grp->load.weight
> + * grp->load.weight
> *
> * That is, the sum collapses because all other CPUs are idle; the UP scenario.
> *
> @@ -3191,7 +3191,7 @@ void reweight_task(struct task_struct *p, int prio)
> *
> * tg->weight * grq->load.weight
> * ge->load.weight = ----------------------------- (6)
> - * tg_load_avg'
> + * tg_load_avg'
> *
> * Where:
> *
> --
> 2.31.1
>

2021-05-28 14:40:57

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle

On Tue, 18 May 2021 at 14:54, Odin Ugedal <[email protected]> wrote:
>
> This fixes an issue where fairness is decreased since cfs_rq's can
> end up not being decayed properly. For two sibling control groups with
> the same priority, this can often lead to a load ratio of 99/1 (!!).
>
> This happen because when a cfs_rq is throttled, all the descendant cfs_rq's
> will be removed from the leaf list. When they initial cfs_rq is
> unthrottled, it will currently only re add descendant cfs_rq's if they
> have one or more entities enqueued. This is not a perfect heuristic.

What would be the other condition in addition to the current one
:cfs_rq->nr_running >= 1 ?
We need to add a cfs_rq in the list if it still contributes to the
tg->load_avg and the split of the share. Can't we add a condition for
this instead of adding a new field ?

>
> This fix change this behavior to save what cfs_rq's was removed from the
> list, and readds them properly on unthrottle.
>
> Can often lead to sutiations like this for equally weighted control
> groups:
>
> $ ps u -C stress
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1
> root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1
>
> Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
> Signed-off-by: Odin Ugedal <[email protected]>
> ---
> kernel/sched/fair.c | 11 ++++++-----
> kernel/sched/sched.h | 1 +
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ceda53c2a87a..e7423d658389 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -376,7 +376,8 @@ static inline bool list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq)
> return false;
> }
>
> -static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
> +/* Returns 1 if cfs_rq was present in the list and removed */
> +static inline bool list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
> {
> if (cfs_rq->on_list) {
> struct rq *rq = rq_of(cfs_rq);
> @@ -393,7 +394,9 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
>
> list_del_rcu(&cfs_rq->leaf_cfs_rq_list);
> cfs_rq->on_list = 0;
> + return 1;
> }
> + return 0;
> }
>
> static inline void assert_list_leaf_cfs_rq(struct rq *rq)
> @@ -4742,9 +4745,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
> if (!cfs_rq->throttle_count) {
> cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
> cfs_rq->throttled_clock_task;
> -
> - /* Add cfs_rq with already running entity in the list */
> - if (cfs_rq->nr_running >= 1)
> + if (cfs_rq->insert_on_unthrottle)
> list_add_leaf_cfs_rq(cfs_rq);
> }
>
> @@ -4759,7 +4760,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)
> /* group is entering throttled state, stop time */
> if (!cfs_rq->throttle_count) {
> cfs_rq->throttled_clock_task = rq_clock_task(rq);
> - list_del_leaf_cfs_rq(cfs_rq);
> + cfs_rq->insert_on_unthrottle = list_del_leaf_cfs_rq(cfs_rq);
> }
> cfs_rq->throttle_count++;
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index a189bec13729..12a707d99ee6 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -602,6 +602,7 @@ struct cfs_rq {
> u64 throttled_clock_task_time;
> int throttled;
> int throttle_count;
> + int insert_on_unthrottle;
> struct list_head throttled_list;
> #endif /* CONFIG_CFS_BANDWIDTH */
> #endif /* CONFIG_FAIR_GROUP_SCHED */
> --
> 2.31.1
>

2021-05-28 16:11:39

by Odin Ugedal

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle

Hi,

> What would be the other condition in addition to the current one
> :cfs_rq->nr_running >= 1 ?

The condition is that if it has load, we should add it (I don't have
100% control on util_avg and runnable_avg tho.). Using
"!cfs_rq_is_decayed()" is another way, but imo. that is a bit
overkill.

> We need to add a cfs_rq in the list if it still contributes to the
> tg->load_avg and the split of the share. Can't we add a condition for
> this instead of adding a new field ?

Yes, using cfs_rq->tg_load_avg_contrib as below would also work the
same way. I still think being explicit that we insert it if we have
removed it is cleaner in a way, as it makes it consistent with the
other use of list_add_leaf_cfs_rq() and list_del_leaf_cfs_rq(), but
that is about preference I guess. I do however think that using
tg_load_avg_contrib will work just fine, as it should always be
positive in case the cfs_rq has some load. I can resent v2 of this
patch using this instead;

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ad7556f99b4a..969ae7f930f5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4720,7 +4720,7 @@ static int tg_unthrottle_up(struct task_group
*tg, void *data)
cfs_rq->throttled_clock_task;

/* Add cfs_rq with already running entity in the list */
- if (cfs_rq->nr_running >= 1)
+ if (cfs_rq->tg_load_avg_contrib)
list_add_leaf_cfs_rq(cfs_rq);
}

2021-05-28 16:23:05

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle

On Fri, 28 May 2021 at 17:07, Odin Ugedal <[email protected]> wrote:
>
> Hi,
>
> > What would be the other condition in addition to the current one
> > :cfs_rq->nr_running >= 1 ?
>
> The condition is that if it has load, we should add it (I don't have
> 100% control on util_avg and runnable_avg tho.). Using
> "!cfs_rq_is_decayed()" is another way, but imo. that is a bit
> overkill.

normally tg_load_avg_contrib should be null when cfs_rq_is_decayed()

>
> > We need to add a cfs_rq in the list if it still contributes to the
> > tg->load_avg and the split of the share. Can't we add a condition for
> > this instead of adding a new field ?
>
> Yes, using cfs_rq->tg_load_avg_contrib as below would also work the
> same way. I still think being explicit that we insert it if we have
> removed it is cleaner in a way, as it makes it consistent with the
> other use of list_add_leaf_cfs_rq() and list_del_leaf_cfs_rq(), but

The reason of this list is to ensure that the load of all cfs_rq are
periodically updated as it is then used to share the runtime between
groups so we should keep to use the rule whenever possible.

> that is about preference I guess. I do however think that using
> tg_load_avg_contrib will work just fine, as it should always be
> positive in case the cfs_rq has some load. I can resent v2 of this
> patch using this instead;
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ad7556f99b4a..969ae7f930f5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4720,7 +4720,7 @@ static int tg_unthrottle_up(struct task_group
> *tg, void *data)
> cfs_rq->throttled_clock_task;
>
> /* Add cfs_rq with already running entity in the list */
> - if (cfs_rq->nr_running >= 1)
> + if (cfs_rq->tg_load_avg_contrib)

we probably need to keep (cfs_rq->nr_running >= 1) as we can have case
where tg_load_avg_contrib is null but a task is enqueued

> list_add_leaf_cfs_rq(cfs_rq);
> }

2021-05-29 09:36:09

by Odin Ugedal

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle

Hi,

> normally tg_load_avg_contrib should be null when cfs_rq_is_decayed()

Yeah, I think that is an ok assumption of how it _should_ work (given
the other patches in flight are merged).

> The reason of this list is to ensure that the load of all cfs_rq are
> periodically updated as it is then used to share the runtime between
> groups so we should keep to use the rule whenever possible.

Yeah, right.

> we probably need to keep (cfs_rq->nr_running >= 1) as we can have case
> where tg_load_avg_contrib is null but a task is enqueued

Yeah, there is probably a chance of enqueuing a task without any load,
and then a parent gets throttled.
So (cfs_rq->tg_load_avg_contrib || cfs_rq->nr_running) is probably the
way to go if we want to avoid
a new field. Will resend a patch with that instead.

In case the new field is the main issue with the original solution, we
could also change the on_list int to have three modes like; NO, YES,
THROTTLED/PAUSED, but that would require a bigger rewrite of the other
logic, so probably outside the scope of this patch.

Thanks
Odin

2021-05-31 12:16:18

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle

On Sat, 29 May 2021 at 11:34, Odin Ugedal <[email protected]> wrote:
>
> Hi,
>
> > normally tg_load_avg_contrib should be null when cfs_rq_is_decayed()
>
> Yeah, I think that is an ok assumption of how it _should_ work (given
> the other patches in flight are merged).
>
> > The reason of this list is to ensure that the load of all cfs_rq are
> > periodically updated as it is then used to share the runtime between
> > groups so we should keep to use the rule whenever possible.
>
> Yeah, right.
>
> > we probably need to keep (cfs_rq->nr_running >= 1) as we can have case
> > where tg_load_avg_contrib is null but a task is enqueued
>
> Yeah, there is probably a chance of enqueuing a task without any load,
> and then a parent gets throttled.
> So (cfs_rq->tg_load_avg_contrib || cfs_rq->nr_running) is probably the
> way to go if we want to avoid
> a new field. Will resend a patch with that instead.

Thanks

>
> In case the new field is the main issue with the original solution, we
> could also change the on_list int to have three modes like; NO, YES,
> THROTTLED/PAUSED, but that would require a bigger rewrite of the other
> logic, so probably outside the scope of this patch.
>
> Thanks
> Odin

2021-06-01 14:06:53

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Fix ascii art by relpacing tabs

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 08f7c2f4d0e9f4283f5796b8168044c034a1bfcb
Gitweb: https://git.kernel.org/tip/08f7c2f4d0e9f4283f5796b8168044c034a1bfcb
Author: Odin Ugedal <[email protected]>
AuthorDate: Tue, 18 May 2021 14:52:02 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 01 Jun 2021 16:00:11 +02:00

sched/fair: Fix ascii art by relpacing tabs

When using something other than 8 spaces per tab, this ascii art
makes not sense, and the reader might end up wondering what this
advanced equation "is".

Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 161b92a..a2c30e5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3093,7 +3093,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->load.weight
* ge->load.weight = ----------------------------- (1)
- * \Sum grq->load.weight
+ * \Sum grq->load.weight
*
* Now, because computing that sum is prohibitively expensive to compute (been
* there, done that) we approximate it with this average stuff. The average
@@ -3107,7 +3107,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->avg.load_avg
* ge->load.weight = ------------------------------ (3)
- * tg->load_avg
+ * tg->load_avg
*
* Where: tg->load_avg ~= \Sum grq->avg.load_avg
*
@@ -3123,7 +3123,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->load.weight
* ge->load.weight = ----------------------------- = tg->weight (4)
- * grp->load.weight
+ * grp->load.weight
*
* That is, the sum collapses because all other CPUs are idle; the UP scenario.
*
@@ -3142,7 +3142,7 @@ void reweight_task(struct task_struct *p, int prio)
*
* tg->weight * grq->load.weight
* ge->load.weight = ----------------------------- (6)
- * tg_load_avg'
+ * tg_load_avg'
*
* Where:
*