When calculate_imbalance function calculate the imbalance, it may
actually get a negative number. In this case, it is meaningless to
return the so-called busiest group and continue to search for the
busiest cpu later. Therefore, only when the imbalance is greater
than 0 should return the busiest group, otherwise return NULL.
Signed-off-by: zgpeng <[email protected]>
Reviewed-by: Samuel Liao <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 601f8bd..9f75303 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9639,7 +9639,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
force_balance:
/* Looks like there is an imbalance. Compute it */
calculate_imbalance(env, &sds);
- return env->imbalance ? sds.busiest : NULL;
+ return env->imbalance > 0 ? sds.busiest : NULL;
out_balanced:
env->imbalance = 0;
--
2.9.5
On Wed, 6 Apr 2022 at 13:23, zgpeng <[email protected]> wrote:
>
> When calculate_imbalance function calculate the imbalance, it may
> actually get a negative number. In this case, it is meaningless to
We should not return a negative imbalance but I suppose this can
happen when we are using the avg_load metrics to calculate imbalance.
Have you faced a use case where imbalance is negative ?
> return the so-called busiest group and continue to search for the
> busiest cpu later. Therefore, only when the imbalance is greater
> than 0 should return the busiest group, otherwise return NULL.
>
> Signed-off-by: zgpeng <[email protected]>
> Reviewed-by: Samuel Liao <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 601f8bd..9f75303 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9639,7 +9639,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> force_balance:
> /* Looks like there is an imbalance. Compute it */
> calculate_imbalance(env, &sds);
> - return env->imbalance ? sds.busiest : NULL;
> + return env->imbalance > 0 ? sds.busiest : NULL;
>
> out_balanced:
> env->imbalance = 0;
> --
> 2.9.5
>
On Wed, 6 Apr 2022 at 17:07, 彭志刚 <[email protected]> wrote:
>
> YES. The following are specific scenarios where negative values occur:
>
>
>
> The scheduler domain contains four groups, namely groupA, groupB, groupC, and groupD;
>
> The types and avg_load conditions of the four groups are as follows
>
>
>
> GroupA TYPE= group_fully_busy avg_load=10 [local group]
>
>
>
> GroupB TYPE= group_has_spare avg_load=1
>
> GroupC TYPE= group_has_spare avg_load=1
>
> GroupD TYPE= group_overloaded avg_load=20 [busiest group]
>
>
>
> The CPU that calls load_balance is located in groupA, and update_sd_lb_stats will select the busiest group in GroupB, groupC, and
>
> groupD, that is, gorupD. Under this condition, other judgments in the find_busiest_group function will be bypassed and the
>
> calculate_imbalance function will be called. The judgment in the middle of the calculate_imbalance function cannot stop this
>
> situation, and it will go to the imbalance calculate logic at the end of the function.At this time, the domain's avg_load=8, but
>
> the local_groupthe's avg_load=10; The negative value is therefore generated.
I think that this case should be covered by an additional test in
calculate_imbalance because we should not try to pull load in groupA
if it's already above the average load. Something like below should
cover this
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9469,6 +9469,16 @@ static inline void calculate_imbalance(struct
lb_env *env, struct sd_lb_stats *s
sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) /
sds->total_capacity;
+
+ /*
+ * Don't pull any tasks if this group is already above the
+ * domain average load.
+ */
+ if (local->avg_load >= sds->avg_load) {
+ env->imbalance = 0;
+ return;
+ }
+
/*
* If the local group is more loaded than the selected
* busiest group don't try to pull any tasks.
>
>
> Vincent Guittot <[email protected]> 于2022年4月6日周三 20:59写道:
>>
>> On Wed, 6 Apr 2022 at 13:23, zgpeng <[email protected]> wrote:
>> >
>> > When calculate_imbalance function calculate the imbalance, it may
>> > actually get a negative number. In this case, it is meaningless to
>>
>> We should not return a negative imbalance but I suppose this can
>> happen when we are using the avg_load metrics to calculate imbalance.
>> Have you faced a use case where imbalance is negative ?
>>
>> > return the so-called busiest group and continue to search for the
>> > busiest cpu later. Therefore, only when the imbalance is greater
>> > than 0 should return the busiest group, otherwise return NULL.
>> >
>> > Signed-off-by: zgpeng <[email protected]>
>> > Reviewed-by: Samuel Liao <[email protected]>
>> > ---
>> > kernel/sched/fair.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 601f8bd..9f75303 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -9639,7 +9639,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>> > force_balance:
>> > /* Looks like there is an imbalance. Compute it */
>> > calculate_imbalance(env, &sds);
>> > - return env->imbalance ? sds.busiest : NULL;
>> > + return env->imbalance > 0 ? sds.busiest : NULL;
>> >
>> > out_balanced:
>> > env->imbalance = 0;
>> > --
>> > 2.9.5
>> >
On 06/04/2022 17:33, Vincent Guittot wrote:
> On Wed, 6 Apr 2022 at 17:07, 彭志刚 <[email protected]> wrote:
[...]
> I think that this case should be covered by an additional test in
> calculate_imbalance because we should not try to pull load in groupA
> if it's already above the average load. Something like below should
> cover this
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9469,6 +9469,16 @@ static inline void calculate_imbalance(struct
> lb_env *env, struct sd_lb_stats *s
>
> sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) /
> sds->total_capacity;
> +
> + /*
> + * Don't pull any tasks if this group is already above the
> + * domain average load.
> + */
> + if (local->avg_load >= sds->avg_load) {
> + env->imbalance = 0;
> + return;
> + }
> +
LGTM. Like for the `local->group_type == group_overloaded` case in
find_busiest_group() where we force `goto out_balanced`.
[...]