schbench shows latency increase for 95 percentile above since:
commit 0b0695f2b34a ("sched/fair: Rework load_balance()")
Align the behavior of the load balancer with the wake up path, which tries
to select an idle CPU which belongs to the LLC for a waking task.
calculate_imbalance() will use nr_running instead of the spare
capacity when CPUs share resources (ie cache) at the domain level. This
will ensure a better spread of tasks on idle CPUs.
Running schbench on a hikey (8cores arm64) shows the problem:
tip/sched/core :
schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
Latency percentiles (usec)
50.0th: 33
75.0th: 45
90.0th: 51
95.0th: 4152
*99.0th: 14288
99.5th: 14288
99.9th: 14288
min=0, max=14276
tip/sched/core + patch :
schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
Latency percentiles (usec)
50.0th: 34
75.0th: 47
90.0th: 52
95.0th: 78
*99.0th: 94
99.5th: 94
99.9th: 94
min=0, max=94
Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()")
Reported-by: Chris Mason <[email protected]>
Suggested-by: Rik van Riel <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aa4c6227cd6d..210b15f068a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9031,7 +9031,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
* emptying busiest.
*/
if (local->group_type == group_has_spare) {
- if (busiest->group_type > group_fully_busy) {
+ if ((busiest->group_type > group_fully_busy) &&
+ !(env->sd->flags & SD_SHARE_PKG_RESOURCES)) {
/*
* If busiest is overloaded, try to fill spare
* capacity. This might end up creating spare capacity
--
2.17.1
On Mon, 2020-11-02 at 11:24 +0100, Vincent Guittot wrote:
> Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()")
> Reported-by: Chris Mason <[email protected]>
> Suggested-by: Rik van Riel <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
Tested-and-reviewed-by: Rik van Riel <[email protected]>
Thank you!
--
All Rights Reversed.
On Mon, Nov 02, 2020 at 11:24:57AM +0100, Vincent Guittot wrote:
> schbench shows latency increase for 95 percentile above since:
> commit 0b0695f2b34a ("sched/fair: Rework load_balance()")
>
> Align the behavior of the load balancer with the wake up path, which tries
> to select an idle CPU which belongs to the LLC for a waking task.
>
> calculate_imbalance() will use nr_running instead of the spare
> capacity when CPUs share resources (ie cache) at the domain level. This
> will ensure a better spread of tasks on idle CPUs.
>
> Running schbench on a hikey (8cores arm64) shows the problem:
>
> tip/sched/core :
> schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
> Latency percentiles (usec)
> 50.0th: 33
> 75.0th: 45
> 90.0th: 51
> 95.0th: 4152
> *99.0th: 14288
> 99.5th: 14288
> 99.9th: 14288
> min=0, max=14276
>
> tip/sched/core + patch :
> schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
> Latency percentiles (usec)
> 50.0th: 34
> 75.0th: 47
> 90.0th: 52
> 95.0th: 78
> *99.0th: 94
> 99.5th: 94
> 99.9th: 94
> min=0, max=94
>
> Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()")
> Reported-by: Chris Mason <[email protected]>
> Suggested-by: Rik van Riel <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
Thanks!
On Mon, Nov 02, 2020 at 11:24:57AM +0100, Vincent Guittot wrote:
> schbench shows latency increase for 95 percentile above since:
> commit 0b0695f2b34a ("sched/fair: Rework load_balance()")
>
> Align the behavior of the load balancer with the wake up path, which tries
> to select an idle CPU which belongs to the LLC for a waking task.
>
> calculate_imbalance() will use nr_running instead of the spare
> capacity when CPUs share resources (ie cache) at the domain level. This
> will ensure a better spread of tasks on idle CPUs.
>
> Running schbench on a hikey (8cores arm64) shows the problem:
>
> tip/sched/core :
> schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
> Latency percentiles (usec)
> 50.0th: 33
> 75.0th: 45
> 90.0th: 51
> 95.0th: 4152
> *99.0th: 14288
> 99.5th: 14288
> 99.9th: 14288
> min=0, max=14276
>
> tip/sched/core + patch :
> schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
> Latency percentiles (usec)
> 50.0th: 34
> 75.0th: 47
> 90.0th: 52
> 95.0th: 78
> *99.0th: 94
> 99.5th: 94
> 99.9th: 94
> min=0, max=94
>
> Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()")
> Reported-by: Chris Mason <[email protected]>
> Suggested-by: Rik van Riel <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
Acked-by: Mel Gorman <[email protected]>
--
Mel Gorman
SUSE Labs