Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754456AbbKWR4u (ORCPT ); Mon, 23 Nov 2015 12:56:50 -0500 Received: from foss.arm.com ([217.140.101.70]:57659 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753751AbbKWR4t (ORCPT ); Mon, 23 Nov 2015 12:56:49 -0500 Date: Mon, 23 Nov 2015 17:56:44 +0000 From: Javi Merino To: Jacob Pan Cc: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , John Stultz , LKML , Arjan van de Ven , Srinivas Pandruvada , Len Brown , Rafael Wysocki , Eduardo Valentin , Paul Turner , Morten Rasmussen , Juri Lelli Subject: Re: [PATCH 3/4] sched: introduce synchronized idle injection Message-ID: <20151123175643.GA10703@e104805> References: <1447444387-23525-1-git-send-email-jacob.jun.pan@linux.intel.com> <1447444387-23525-4-git-send-email-jacob.jun.pan@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1447444387-23525-4-git-send-email-jacob.jun.pan@linux.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 17114 Lines: 179 On Fri, Nov 13, 2015 at 11:53:06AM -0800, Jacob Pan wrote: > With increasingly constrained power and thermal budget, it's often > necessary to cap power via throttling. Throttling individual CPUs > or devices at random times can help power capping but may not be > optimal in terms of energy efficiency. Frequency scaling is also > limited by certain range before losing energy efficiency. > > In general, the optimal solution in terms of energy efficiency is > to align idle periods such that more shared circuits can be power > gated to enter lower power states. Combined with energy efficient > frequency point, idle injection provides a way to scale power and > performance efficiently. > > This patch introduces a scheduler based idle injection method, it > works by blocking CFS runqueue synchronously and periodically. The > actions on all online CPUs are orchestrated by per CPU hrtimers. > > Two sysctl knobs are given to the userspace for selecting the > percentage of idle time as well as the forced idle duration for each > idle period injected. > > Since only CFS class is targeted, other high priority tasks are not > affected, such as EDF and RT tasks as well as softirq and interrupts. > > Hotpath in CFS pick_next_task is optimized by Peter Zijlstra, where > a new runnable flag is introduced to combine forced idle and > nr_running. > > Signed-off-by: Jacob Pan > --- > include/linux/sched.h | 11 ++ > include/linux/sched/sysctl.h | 5 + > init/Kconfig | 10 ++ > kernel/sched/fair.c | 353 ++++++++++++++++++++++++++++++++++++++++++- > kernel/sched/sched.h | 54 ++++++- > kernel/sysctl.c | 21 +++ > 6 files changed, 449 insertions(+), 5 deletions(-) I've tested this series on Juno (2xCortex-A57 4xCortex-A53). If you idle inject for 50% of the time, when I run 6 busy loops the scheduler sometimes keeps two of them in the same cpu while the another cpu is completely idle. Without idle injection the scheduler does the sensible thing: put one busy loop in each CPU. I'm running systemd and this only happens with CONFIG_SCHED_AUTOGROUP=y. If I unset CONFIG_SCHED_AUTOGROUP, the tasks are spread across all cpus as usual. See below part of the trace that shows this problem. CPU3 has two 100% tasks: 1554 and 1549 but the scheduler never moves one of the tasks to CPU4, which has an empty runqueue. Both cpus are in the same domain. Juri helped me add two additional trace points to track the load of a task and cpu. This tracepoints are added at the end of update_load_avg(). -0 [002] 164.739796: sched_cfs_idle_inject_timer: throttled=0 -0 [000] 164.739797: sched_cfs_idle_inject_timer: throttled=0 -0 [005] 164.739797: sched_cfs_idle_inject_timer: throttled=0 -0 [001] 164.739797: sched_cfs_idle_inject_timer: throttled=0 -0 [003] 164.739797: sched_cfs_idle_inject_timer: throttled=0 -0 [004] 164.739798: sched_cfs_idle_inject_timer: throttled=0 -0 [002] 164.739802: sched_load_avg_cpu: cpu=2 load_avg=171 util_avg=406 -0 [002] 164.739803: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1006 util_avg=400 load_sum=48043453 util_sum=19130537 period_contrib=173 -0 [001] 164.739803: sched_load_avg_cpu: cpu=1 load_avg=170 util_avg=405 -0 [002] 164.739804: sched_load_avg_cpu: cpu=2 load_avg=1014 util_avg=403 -0 [001] 164.739804: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1008 util_avg=401 load_sum=48161276 util_sum=19177731 period_contrib=288 -0 [005] 164.739804: sched_load_avg_cpu: cpu=5 load_avg=169 util_avg=404 -0 [002] 164.739805: sched_switch: swapper/2:0 [120] R ==> busy_loop:1552 [120] -0 [001] 164.739805: sched_load_avg_cpu: cpu=1 load_avg=1024 util_avg=407 -0 [003] 164.739805: sched_load_avg_cpu: cpu=3 load_avg=340 util_avg=405 -0 [000] 164.739805: sched_load_avg_cpu: cpu=0 load_avg=168 util_avg=400 -0 [001] 164.739806: sched_switch: swapper/1:0 [120] R ==> busy_loop:1551 [120] -0 [005] 164.739806: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1010 util_avg=402 load_sum=48229881 util_sum=19205027 period_contrib=355 -0 [003] 164.739807: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1012 util_avg=193 load_sum=48316673 util_sum=9247244 period_contrib=441 -0 [000] 164.739807: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1005 util_avg=400 load_sum=48003551 util_sum=19119112 period_contrib=134 -0 [005] 164.739808: sched_load_avg_cpu: cpu=5 load_avg=1002 util_avg=399 -0 [003] 164.739808: sched_load_avg_cpu: cpu=3 load_avg=2045 util_avg=407 -0 [000] 164.739809: sched_load_avg_cpu: cpu=0 load_avg=1008 util_avg=401 -0 [005] 164.739810: sched_switch: swapper/5:0 [120] R ==> busy_loop:1550 [120] -0 [003] 164.739810: sched_switch: swapper/3:0 [120] R ==> busy_loop:1549 [120] -0 [000] 164.739811: sched_switch: swapper/0:0 [120] R ==> busy_loop:1553 [120] busy_loop-1552 [002] 164.743793: sched_stat_runtime: comm=busy_loop pid=1552 runtime=3991560 [ns] vruntime=605432548 [ns] busy_loop-1549 [003] 164.743794: sched_stat_runtime: comm=busy_loop pid=1549 runtime=3990040 [ns] vruntime=382380848 [ns] busy_loop-1552 [002] 164.743794: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1024 util_avg=456 load_sum=48889883 util_sum=21796057 period_contrib=999 busy_loop-1553 [000] 164.743794: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3990180 [ns] vruntime=590391894 [ns] busy_loop-1551 [001] 164.743794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=3992100 [ns] vruntime=272056341 [ns] busy_loop-1550 [005] 164.743794: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3990920 [ns] vruntime=198320034 [ns] busy_loop-1552 [002] 164.743795: sched_load_avg_cpu: cpu=2 load_avg=1010 util_avg=450 busy_loop-1551 [001] 164.743796: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1004 util_avg=447 load_sum=47958941 util_sum=21380913 period_contrib=90 busy_loop-1549 [003] 164.743796: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1007 util_avg=257 load_sum=48112396 util_sum=12285572 period_contrib=241 busy_loop-1552 [002] 164.743796: sched_load_avg_cpu: cpu=2 load_avg=170 util_avg=453 busy_loop-1553 [000] 164.743796: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1023 util_avg=456 load_sum=48847931 util_sum=21780791 period_contrib=958 busy_loop-1551 [001] 164.743796: sched_load_avg_cpu: cpu=1 load_avg=1020 util_avg=454 busy_loop-1550 [005] 164.743797: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1005 util_avg=448 load_sum=48026522 util_sum=21410614 period_contrib=156 busy_loop-1549 [003] 164.743797: sched_load_avg_cpu: cpu=3 load_avg=2036 util_avg=454 busy_loop-1553 [000] 164.743798: sched_load_avg_cpu: cpu=0 load_avg=1004 util_avg=447 busy_loop-1551 [001] 164.743798: sched_load_avg_cpu: cpu=1 load_avg=169 util_avg=452 busy_loop-1550 [005] 164.743798: sched_load_avg_cpu: cpu=5 load_avg=1020 util_avg=455 busy_loop-1553 [000] 164.743800: sched_load_avg_cpu: cpu=0 load_avg=171 util_avg=456 busy_loop-1549 [003] 164.743800: sched_load_avg_cpu: cpu=3 load_avg=339 util_avg=452 busy_loop-1550 [005] 164.743800: sched_load_avg_cpu: cpu=5 load_avg=168 util_avg=450 busy_loop-1552 [002] 164.747792: sched_stat_runtime: comm=busy_loop pid=1552 runtime=3999320 [ns] vruntime=609431868 [ns] busy_loop-1553 [000] 164.747793: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3999380 [ns] vruntime=594391274 [ns] busy_loop-1549 [003] 164.747793: sched_stat_runtime: comm=busy_loop pid=1549 runtime=3999540 [ns] vruntime=386380388 [ns] busy_loop-1552 [002] 164.747794: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1019 util_avg=499 load_sum=48694671 util_sum=23849523 period_contrib=808 busy_loop-1551 [001] 164.747794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=3999880 [ns] vruntime=276056221 [ns] busy_loop-1550 [005] 164.747795: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3999280 [ns] vruntime=202319314 [ns] busy_loop-1552 [002] 164.747795: sched_load_avg_cpu: cpu=2 load_avg=1006 util_avg=492 busy_loop-1551 [001] 164.747795: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1022 util_avg=500 load_sum=48813533 util_sum=23907693 period_contrib=924 busy_loop-1553 [000] 164.747795: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1019 util_avg=499 load_sum=48652717 util_sum=23832040 period_contrib=767 busy_loop-1549 [003] 164.747796: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1003 util_avg=315 load_sum=47917292 util_sum=15063949 period_contrib=50 busy_loop-1551 [001] 164.747796: sched_load_avg_cpu: cpu=1 load_avg=1016 util_avg=497 busy_loop-1552 [002] 164.747796: sched_load_avg_cpu: cpu=2 load_avg=169 util_avg=496 busy_loop-1550 [005] 164.747797: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1023 util_avg=501 load_sum=48880090 util_sum=23938753 period_contrib=989 busy_loop-1553 [000] 164.747797: sched_load_avg_cpu: cpu=0 load_avg=1022 util_avg=500 busy_loop-1549 [003] 164.747797: sched_load_avg_cpu: cpu=3 load_avg=2028 util_avg=496 busy_loop-1551 [001] 164.747797: sched_load_avg_cpu: cpu=1 load_avg=169 util_avg=495 busy_loop-1550 [005] 164.747798: sched_load_avg_cpu: cpu=5 load_avg=1016 util_avg=497 busy_loop-1553 [000] 164.747799: sched_load_avg_cpu: cpu=0 load_avg=170 util_avg=499 busy_loop-1549 [003] 164.747800: sched_load_avg_cpu: cpu=3 load_avg=337 util_avg=494 busy_loop-1550 [005] 164.747800: sched_load_avg_cpu: cpu=5 load_avg=168 util_avg=492 busy_loop-1552 [002] 164.751792: sched_stat_runtime: comm=busy_loop pid=1552 runtime=4000260 [ns] vruntime=613432128 [ns] busy_loop-1549 [003] 164.751793: sched_stat_runtime: comm=busy_loop pid=1549 runtime=3999760 [ns] vruntime=390380148 [ns] busy_loop-1553 [000] 164.751793: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3999920 [ns] vruntime=598391194 [ns] busy_loop-1552 [002] 164.751793: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1015 util_avg=538 load_sum=48500452 util_sum=25717351 period_contrib=618 busy_loop-1550 [005] 164.751793: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3999920 [ns] vruntime=206319234 [ns] busy_loop-1552 [002] 164.751794: sched_load_avg_cpu: cpu=2 load_avg=1024 util_avg=542 busy_loop-1551 [001] 164.751794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=4000120 [ns] vruntime=280056341 [ns] busy_loop-1549 [003] 164.751795: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1021 util_avg=376 load_sum=48771927 util_sum=17985591 period_contrib=884 busy_loop-1553 [000] 164.751795: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1015 util_avg=538 load_sum=48458496 util_sum=25697835 period_contrib=577 busy_loop-1551 [001] 164.751795: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1018 util_avg=539 load_sum=48619308 util_sum=25780552 period_contrib=734 busy_loop-1550 [005] 164.751795: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1019 util_avg=540 load_sum=48685865 util_sum=25814558 period_contrib=799 busy_loop-1552 [002] 164.751796: sched_load_avg_cpu: cpu=2 load_avg=169 util_avg=535 busy_loop-1551 [001] 164.751796: sched_load_avg_cpu: cpu=1 load_avg=1011 util_avg=536 busy_loop-1553 [000] 164.751797: sched_load_avg_cpu: cpu=0 load_avg=1018 util_avg=539 busy_loop-1549 [003] 164.751797: sched_load_avg_cpu: cpu=3 load_avg=2020 util_avg=535 busy_loop-1550 [005] 164.751797: sched_load_avg_cpu: cpu=5 load_avg=1012 util_avg=536 busy_loop-1551 [001] 164.751797: sched_load_avg_cpu: cpu=1 load_avg=168 util_avg=533 busy_loop-1553 [000] 164.751799: sched_load_avg_cpu: cpu=0 load_avg=169 util_avg=538 busy_loop-1549 [003] 164.751799: sched_load_avg_cpu: cpu=3 load_avg=336 util_avg=533 busy_loop-1550 [005] 164.751800: sched_load_avg_cpu: cpu=5 load_avg=171 util_avg=543 busy_loop-1549 [003] 164.751807: sched_stat_runtime: comm=busy_loop pid=1549 runtime=13700 [ns] vruntime=390393848 [ns] busy_loop-1549 [003] 164.751809: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1021 util_avg=376 load_sum=48785239 util_sum=17998903 period_contrib=897 busy_loop-1549 [003] 164.751811: sched_load_avg_cpu: cpu=3 load_avg=2020 util_avg=535 busy_loop-1549 [003] 164.751812: sched_load_avg_task: comm=busy_loop pid=1554 cpu=3 load_avg=1015 util_avg=163 load_sum=48472554 util_sum=7827475 period_contrib=593 busy_loop-1549 [003] 164.751814: sched_load_avg_cpu: cpu=3 load_avg=2020 util_avg=535 busy_loop-1549 [003] 164.751816: sched_switch: busy_loop:1549 [120] R ==> busy_loop:1554 [120] busy_loop-1552 [002] 164.755792: sched_stat_runtime: comm=busy_loop pid=1552 runtime=3999800 [ns] vruntime=617431928 [ns] busy_loop-1553 [000] 164.755793: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3999880 [ns] vruntime=602391074 [ns] busy_loop-1552 [002] 164.755793: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1011 util_avg=574 load_sum=48306205 util_sum=27414009 period_contrib=428 busy_loop-1550 [005] 164.755793: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3999780 [ns] vruntime=210319014 [ns] busy_loop-1554 [003] 164.755793: sched_stat_runtime: comm=busy_loop pid=1554 runtime=3986540 [ns] vruntime=382907621 [ns] busy_loop-1552 [002] 164.755794: sched_load_avg_cpu: cpu=2 load_avg=1019 util_avg=578 busy_loop-1551 [001] 164.755794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=3999860 [ns] vruntime=284056201 [ns] busy_loop-1553 [000] 164.755795: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1010 util_avg=573 load_sum=48264247 util_sum=27392629 period_contrib=387 busy_loop-1551 [001] 164.755795: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1014 util_avg=575 load_sum=48425055 util_sum=27481823 period_contrib=544 busy_loop-1552 [002] 164.755795: sched_load_avg_cpu: cpu=2 load_avg=168 util_avg=570 busy_loop-1550 [005] 164.755795: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1015 util_avg=576 load_sum=48491612 util_sum=27518531 period_contrib=609 busy_loop-1554 [003] 164.755796: sched_load_avg_task: comm=busy_loop pid=1554 cpu=3 load_avg=1010 util_avg=230 load_sum=48265186 util_sum=10993484 period_contrib=390 busy_loop-1551 [001] 164.755796: sched_load_avg_cpu: cpu=1 load_avg=1007 util_avg=571 busy_loop-1553 [000] 164.755796: sched_load_avg_cpu: cpu=0 load_avg=1014 util_avg=575 busy_loop-1550 [005] 164.755797: sched_load_avg_cpu: cpu=5 load_avg=1008 util_avg=572 busy_loop-1554 [003] 164.755797: sched_load_avg_cpu: cpu=3 load_avg=2012 util_avg=571 busy_loop-1551 [001] 164.755797: sched_load_avg_cpu: cpu=1 load_avg=171 util_avg=581 busy_loop-1553 [000] 164.755799: sched_load_avg_cpu: cpu=0 load_avg=168 util_avg=574 busy_loop-1550 [005] 164.755799: sched_load_avg_cpu: cpu=5 load_avg=170 util_avg=579 busy_loop-1554 [003] 164.755799: sched_load_avg_cpu: cpu=3 load_avg=342 util_avg=581 busy_loop-1552 [002] 164.759791: sched_cfs_idle_inject_timer: throttled=1 busy_loop-1551 [001] 164.759791: sched_cfs_idle_inject_timer: throttled=1 busy_loop-1550 [005] 164.759792: sched_cfs_idle_inject_timer: throttled=1 busy_loop-1554 [003] 164.759792: sched_cfs_idle_inject_timer: throttled=1 busy_loop-1553 [000] 164.759792: sched_cfs_idle_inject_timer: throttled=1 Cheers, Javi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/