Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2514066imm; Mon, 10 Sep 2018 02:16:48 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZFSXm2eYDR+jiOy6+g9TTcipY9O8ZWGKfHSypILwZyJAeygHhhak+JIN/kAiFUkFQDsqq2 X-Received: by 2002:a17:902:c85:: with SMTP id 5-v6mr20887471plt.141.1536571008023; Mon, 10 Sep 2018 02:16:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536571007; cv=none; d=google.com; s=arc-20160816; b=v/znLkMJ9lTdAy4f6cZpBstk1JRbbDirCE+ufS8eTaojTIqqTOxpobg53Z4rNzzMl3 6IyM5xl56n0cWRxmraHE6RsGrPM10TcxeBoYvgPShvzKA6R5Equi5EsbdEM0/poBXXIc wn4ECVZqFOLhpy0gpbhYb9WOSHihYgL+UPDB2AHuj+BpgdqNC8072jVh+FvB1Fz3C6XM oIhCNMk7uOmuRlZ9BiKuXEWD9xkdIN4ZoyPgED1adwWAqoICWCDz8VjxWELhujyI2PkL 7hwbbgbZtSZWHb4YhOQJrIQzKQ+AS9z1Cyf0kkXl4hCi9bB0n4K0LgNpygOFVy+cZsig 9+NA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=dUzGkGf9l8NcWFCHpXWl4MAfIJBodDw/wIXzUj4lUiQ=; b=ekn3iVyeURKKhb2lMkM3d5d2azPUSvMXM4Rdv7/ylKz1n32XsD5GMUrzFslaNEbP2N VkgUBLOubElCdcFhPd9DvkehlPSN8B4I2OYasmpDs30FK2108ta4b4o8X8Jr9lsPnfsr sQHQJg8ff/O2jBR45IbgbZUQd6vJM46fluJtchFaQQjHQFNWmwUJzXtoQ6UGSmbnK3b/ TlPNVMNWyp9/6O+0NQpfnJZQgH+0RJSBzd+/hdtkeGiaX1UEOoI6E9MLGS9yXdL1QMcR Fuzn7p0V/gnWFRqK6CfoMRvYi/g+dPtBnglI7HdhpVulO/n0xf1WW+yAcaCZTWpA2zAG LpZg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bd1-v6si15749333plb.156.2018.09.10.02.16.32; Mon, 10 Sep 2018 02:16:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728040AbeIJOHz convert rfc822-to-8bit (ORCPT + 99 others); Mon, 10 Sep 2018 10:07:55 -0400 Received: from cloudserver094114.home.pl ([79.96.170.134]:55206 "EHLO cloudserver094114.home.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726081AbeIJOHz (ORCPT ); Mon, 10 Sep 2018 10:07:55 -0400 Received: from 79.184.255.178.ipv4.supernova.orange.pl (79.184.255.178) (HELO aspire.rjw.lan) by serwer1319399.home.pl (79.96.170.134) with SMTP (IdeaSmtpServer 0.83.123) id 80fcedb844ad18f9; Mon, 10 Sep 2018 11:14:47 +0200 From: "Rafael J. Wysocki" To: Quentin Perret Cc: peterz@infradead.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, gregkh@linuxfoundation.org, mingo@redhat.com, dietmar.eggemann@arm.com, morten.rasmussen@arm.com, chris.redpath@arm.com, patrick.bellasi@arm.com, valentin.schneider@arm.com, vincent.guittot@linaro.org, thara.gopinath@linaro.org, viresh.kumar@linaro.org, tkjos@google.com, joel@joelfernandes.org, smuckle@google.com, adharmap@codeaurora.org, skannan@codeaurora.org, pkondeti@codeaurora.org, juri.lelli@redhat.com, edubezval@gmail.com, srinivas.pandruvada@linux.intel.com, currojerez@riseup.net, javi.merino@kernel.org Subject: Re: [PATCH v6 00/14] Energy Aware Scheduling Date: Mon, 10 Sep 2018 11:12:11 +0200 Message-ID: <2338896.lUgtDMVfv7@aspire.rjw.lan> In-Reply-To: <20180820094420.26590-1-quentin.perret@arm.com> References: <20180820094420.26590-1-quentin.perret@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday, August 20, 2018 11:44:06 AM CEST Quentin Perret wrote: > This patch series introduces Energy Aware Scheduling (EAS) for CFS tasks > on platforms with asymmetric CPU topologies (e.g. Arm big.LITTLE). > > For more details about the ideas behind it and the overall design, > please refer to the cover letter of version 5 [1]. > > > 1. Version History > ------------------ > > Changes v5[1]->v6: > - Rebased on Peter’s sched/core branch (that includes Morten's misfit > patches [2] and the automatic detection of SD_ASYM_CPUCAPACITY [3]) > - Removed patch 13/14 (not needed with the automatic flag detection) > - Added patch creating a dependency between sugov and EAS > - Renamed frequency domains to performance domains to avoid creating too > deep assumptions in the code about the HW > - Renamed the sd_ea shortcut sd_asym_cpucapacity > - Added comment to explain why new tasks are not accounted when > detecting the 'overutilized' flag > - Added comment explaining why forkees don’t go in > find_energy_efficient_cpu() > > Changes v4[4]->v5: > - Removed the RCU protection of the EM tables and the associated > need for em_rescale_cpu_capacity(). > - Factorized schedutil’s PELT aggregation function with EAS > - Improved comments/doc in the EM framework > - Added check on the uarch of CPUs in one fd in the EM framework > - Reduced CONFIG_ENERGY_MODEL ifdefery in kernel/sched/topology.c > - Cleaned-up update_sg_lb_stats parameters > - Improved comments in compute_energy() to explain the multi-rd > scenarios > > Changes v3[5]->v4: > - Replaced spinlock in EM framework by smp_store_release/READ_ONCE > - Fixed missing locks to protect rcu_assign_pointer in EM framework > - Fixed capacity calculation in EM framework on 32 bits system > - Fixed compilation issue for CONFIG_ENERGY_MODEL=n > - Removed cpumask from struct em_freq_domain, now dynamically allocated > - Power costs of the EM are specified in milliwatts > - Added example of CPUFreq driver modification > - Added doc/comments in the EM framework and better commit header > - Fixed integration issue with util_est in cpu_util_next() > - Changed scheduler topology code to have one freq. dom. list per rd > - Split sched topology patch in smaller patches > - Added doc/comments explaining the heuristic in the wake-up path > - Changed energy threshold for migration to from 1.5% to 6% > > Changes v2[6]->v3: > - Removed the PM_OPP dependency by implementing a new EM framework > - Modified the scheduler topology code to take references on the EM data > structures > - Simplified the overutilization mechanism into a system-wide flag > - Reworked the integration in the wake-up path using the sd_ea shortcut > - Rebased on tip/sched/core (247f2f6f3c70 "sched/core: Don't schedule > threads on pre-empted vCPUs") > > Changes v1[7]->v2: > - Reworked interface between fair.c and energy.[ch] (Remove #ifdef > CONFIG_PM_OPP from energy.c) (Greg KH) > - Fixed licence & header issue in energy.[ch] (Greg KH) > - Reordered EAS path in select_task_rq_fair() (Joel) > - Avoid prev_cpu if not allowed in select_task_rq_fair() (Morten/Joel) > - Refactored compute_energy() (Patrick) > - Account for RT/IRQ pressure in task_fits() (Patrick) > - Use UTIL_EST and DL utilization during OPP estimation (Patrick/Juri) > - Optimize selection of CPU candidates in the energy-aware wake-up path > - Rebased on top of tip/sched/core (commit b720342849fe “sched/core: > Update Preempt_notifier_key to modern API”) > > > 2. Test results > --------------- > > Two fundamentally different tests were executed. Firstly the energy test > case shows the impact on energy consumption this patch-set has using a > synthetic set of tasks. Secondly the performance test case provides the > conventional hackbench metric numbers. > > The tests run on two arm64 big.LITTLE platforms: Hikey960 (4xA73 + > 4xA53) and Juno r0 (2xA57 + 4xA53). > > Base kernel is tip/sched/core (4.18-rc5), with some Hikey960 and Juno > specific patches, the SD_ASYM_CPUCAPACITY flag set at DIE sched domain > level for arm64 and schedutil as cpufreq governor [8]. > > 2.1 Energy test case > > 10 iterations of between 10 and 50 periodic rt-app tasks (16ms period, > 5% duty-cycle) for 30 seconds with energy measurement. Unit is Joules. > The goal is to save energy, so lower is better. > > 2.1.1 Hikey960 > > Energy is measured with an ACME Cape on an instrumented board. Numbers > include consumption of big and little CPUs, LPDDR memory, GPU and most > of the other small components on the board. They do not include > consumption of the radio chip (turned-off anyway) and external > connectors. > > +----------+-----------------+-------------------------+ > | | Without patches | With patches | > +----------+--------+--------+------------------+------+ > | Tasks nb | Mean | RSD* | Mean | RSD* | > +----------+--------+--------+------------------+------+ > | 10 | 34.33 | 4.8% | 30.51 (-11.13%) | 6.4% | > | 20 | 52.84 | 1.9% | 44.15 (-16.45%) | 2.0% | > | 30 | 66.20 | 1.8% | 60.14 (-9.15%) | 4.8% | > | 40 | 90.83 | 2.5% | 86.91 (-4.32%) | 2.7% | > | 50 | 136.76 | 4.6% | 108.90 (-20.37%) | 4.7% | > +----------+--------+--------+------------------+------+ > > 2.1.2 Juno r0 > > Energy is measured with the onboard energy meter. Numbers include > consumption of big and little CPUs. > > +----------+-----------------+------------------------+ > | | Without patches | With patches | > +----------+--------+--------+-----------------+------+ > | Tasks nb | Mean | RSD* | Mean | RSD* | > +----------+--------+--------+-----------------+------+ > | 10 | 11.48 | 3.2% | 8.09 (-29.53%) | 3.1% | > | 20 | 20.84 | 3.4% | 14.38 (-31.00%) | 1.1% | > | 30 | 32.94 | 3.2% | 23.97 (-27.23%) | 1.0% | > | 40 | 46.05 | 0.5% | 37.82 (-17.87%) | 6.2% | > | 50 | 57.25 | 0.5% | 55.30 ( -3.41%) | 0.5% | > +----------+--------+--------+-----------------+------+ > > > 2.2 Performance test case > > 30 iterations of perf bench sched messaging --pipe --thread --group G > --loop L with G=[1 2 4 8] and L=50000 (Hikey960)/16000 (Juno r0). > > 2.2.1 Hikey960 > > The impact of thermal capping was mitigated thanks to a heatsink, a > fan, and a 30 sec delay between two successive executions. IPA is > disabled to reduce the stddev. > > +----------------+-----------------+------------------------+ > | | Without patches | With patches | > +--------+-------+---------+-------+----------------+-------+ > | Groups | Tasks | Mean | RSD* | Mean | RSD* | > +--------+-------+---------+-------+----------------+-------+ > | 1 | 40 | 8.04 | 0.88% | 8.22 (+2.31%) | 1.76% | > | 2 | 80 | 14.78 | 0.67% | 14.83 (+0.35%) | 0.59% | > | 4 | 160 | 30.92 | 0.57% | 30.95 (+0.09%) | 0.51% | > | 8 | 320 | 65.54 | 0.32% | 65.57 (+0.04%) | 0.46% | > +--------+-------+---------+-------+----------------+-------+ > > 2.2.2 Juno r0 > > +----------------+-----------------+-----------------------+ > | | Without patches | With patches | > +--------+-------+---------+-------+---------------+-------+ > | Groups | Tasks | Mean | RSD* | Mean | RSD* | > +--------+-------+---------+-------+---------------+-------+ > | 1 | 40 | 7.74 | 0.13% | 7.82 (0.01%) | 0.12% | > | 2 | 80 | 14.27 | 0.15% | 14.27 (0.00%) | 0.14% | > | 4 | 160 | 27.07 | 0.35% | 26.96 (0.00%) | 0.18% | > | 8 | 320 | 55.14 | 1.81% | 55.21 (0.00%) | 1.29% | > +--------+-------+---------+-------+---------------+-------+ > > *RSD: Relative Standard Deviation (std dev / mean) > > > [1] https://marc.info/?l=linux-pm&m=153243513908731&w=2 > [2] https://marc.info/?l=linux-kernel&m=153069968022982&w=2 > [3] https://marc.info/?l=linux-kernel&m=153209362826476&w=2 > [4] https://marc.info/?l=linux-kernel&m=153018606728533&w=2 > [5] https://marc.info/?l=linux-kernel&m=152691273111941&w=2 > [6] https://marc.info/?l=linux-kernel&m=152302902427143&w=2 > [7] https://marc.info/?l=linux-kernel&m=152153905805048&w=2 > [8] http://www.linux-arm.org/git?p=linux-qp.git;a=shortlog;h=refs/heads/upstream/eas_v6 > > Morten Rasmussen (1): > sched: Add over-utilization/tipping point indicator > > Quentin Perret (13): > sched: Relocate arch_scale_cpu_capacity > sched/cpufreq: Factor out utilization to frequency mapping > PM: Introduce an Energy Model management framework > PM / EM: Expose the Energy Model in sysfs > sched/topology: Reference the Energy Model of CPUs when available > sched/topology: Lowest CPU asymmetry sched_domain level pointer > sched/topology: Introduce sched_energy_present static key > sched/fair: Clean-up update_sg_lb_stats parameters > sched/cpufreq: Refactor the utilization aggregation method > sched/fair: Introduce an energy estimation helper function > sched/fair: Select an energy-efficient CPU on task wake-up > sched/topology: Make Energy Aware Scheduling depend on schedutil > OPTIONAL: cpufreq: dt: Register an Energy Model > > drivers/cpufreq/cpufreq-dt.c | 45 ++++- > drivers/cpufreq/cpufreq.c | 4 + > include/linux/cpufreq.h | 1 + > include/linux/energy_model.h | 162 +++++++++++++++++ > include/linux/sched/cpufreq.h | 6 + > include/linux/sched/topology.h | 19 ++ > kernel/power/Kconfig | 15 ++ > kernel/power/Makefile | 2 + > kernel/power/energy_model.c | 289 +++++++++++++++++++++++++++++ > kernel/sched/cpufreq_schedutil.c | 136 ++++++++++---- > kernel/sched/fair.c | 301 ++++++++++++++++++++++++++++--- > kernel/sched/sched.h | 65 ++++--- > kernel/sched/topology.c | 231 +++++++++++++++++++++++- > 13 files changed, 1195 insertions(+), 81 deletions(-) > create mode 100644 include/linux/energy_model.h > create mode 100644 kernel/power/energy_model.c I have looked at all of the patches in the series now and I don't really have any major objections from the cpufreq (and generally PM) perspective. There are some points of concern here and there, but they are mostly details and things I would do differently, but as a whole this looks mostly OK to me. I will reply to the individual patches where there are issues in my view. Thanks, Rafael