Received: by 10.213.65.68 with SMTP id h4csp775249imn; Fri, 6 Apr 2018 08:42:09 -0700 (PDT) X-Google-Smtp-Source: AIpwx49fbcq2ak34d13KxB1SlOpA+C6V+eBq+T3d7LBf3WWUrq2p68bUcrd8dnQN4tsBDPInqgvD X-Received: by 2002:a17:902:529:: with SMTP id 38-v6mr28178616plf.64.1523029329911; Fri, 06 Apr 2018 08:42:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523029329; cv=none; d=google.com; s=arc-20160816; b=Z7B23tyVa49mIa/G1ILumsU0gjzOwH4dLXpGtnJ4SrDBF55uPRYjRtlfHc1zbhMPjO 2/Z5ZHuh11EBRzBFxpPvMB/iXpH0mGXOSXbthi/JASr4+0F/gbJkRft6UMX68ZsJ0Qkn ymKUQJf5v1xhonW+VS9vdXYWgVkhnoe7wVYCKYk3iqnMtcw13Jt5rk7PFJx1tPpu9BpT 5IasRcC/ygKknC/E00v+Px2EoruJZbfJNw18MI+bxoR4/uUCXf+POuhrGqf/yGkO7RlU +NsDy9z0WjjnfRGLthNY1MFoEgy+h8zg4YvaWdf+6GwdpkTOUJC5YHuMNcgGZ1QxKhpe cqgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:arc-authentication-results; bh=QAewBfJDfOjOgW0Z2VkzNZPYzvkt6VgqPOaFMeQ+gos=; b=OkjAwRA6NNnNm6yspNqpNeA+WkCHoxX0pVHBY5q08JvZJMu2CeBoz5L4ueM9osP7bX 81iYWgHyMOQJpvkGJpYTJ9oZuPaELh4NIWCLA3chU25qUhNH3PJ+20xvxyL+RcVS4WiU P5LfBCJcsN0TrWg+e/091Vhbole3Ebn+6TZxvxddtLMLGe3v0fCrb0xUVFfiA7Z+gJ1g dyotG02PkxmmH1WRnbdh2aR7sfyJegleFk4kXf8k8vRgAH6h7Yktoyiq0qXVnDpKMT+O 1Qkmznxy/AEgDgJ23Gw8vApB8TSZbc5JxwRdxaPgddqlwD6nYwtL/ed6DvAG13AzvA/+ +rpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d16-v6si11112248plj.220.2018.04.06.08.41.33; Fri, 06 Apr 2018 08:42:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752431AbeDFPg7 (ORCPT + 99 others); Fri, 6 Apr 2018 11:36:59 -0400 Received: from foss.arm.com ([217.140.101.70]:39032 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751447AbeDFPg5 (ORCPT ); Fri, 6 Apr 2018 11:36:57 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CED3B80D; Fri, 6 Apr 2018 08:36:56 -0700 (PDT) Received: from e107985-lin.cambridge.arm.com (e107985-lin.cambridge.arm.com [10.1.210.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id DE5E23F587; Fri, 6 Apr 2018 08:36:53 -0700 (PDT) From: Dietmar Eggemann To: linux-kernel@vger.kernel.org, Peter Zijlstra , Quentin Perret , Thara Gopinath Cc: linux-pm@vger.kernel.org, Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Joel Fernandes , Juri Lelli , Steve Muckle , Eduardo Valentin Subject: [RFC PATCH v2 0/6] Energy Aware Scheduling Date: Fri, 6 Apr 2018 16:36:01 +0100 Message-Id: <20180406153607.17815-1-dietmar.eggemann@arm.com> X-Mailer: git-send-email 2.11.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 1. Overview The Energy Aware Scheduler (EAS) based on Morten Rasmussen's posting on LKML [1] is currently part of the AOSP Common Kernel and runs on today's smartphones with Arm's big.LITTLE CPUs. Based on the experience gained over the last two and a half years in product development, we propose an energy model based task placement for CPUs with asymmetric core capacities (e.g. Arm big.LITTLE or DynamIQ), to align with the EAS adopted by the AOSP Common Kernel. We have developed a simplified energy model, based on the physical active power/performance curve of each core type using existing SoC power/performance data already known to the kernel. The energy model is used to select the most energy-efficient CPU to place each task, taking utilization into account. 1.1 Energy Model A CPU with asymmetric core capacities features cores with significantly different energy and performance characteristics. As the configurations can vary greatly from one SoC to another, designing an energy-efficient scheduling heuristic that performs well on a broad spectrum of platforms appears to be particularly hard. This proposal attempts to solve this issue by providing the scheduler with an energy model of the platform which enables energy impact estimation of scheduling decisions in a generic way. The energy model is kept very simple as it represents only the active power of CPUs at all available P-states and relies on existing data in the kernel (only used by the thermal subsystem so far). This proposal does not include the power consumption of C-states and cluster-level resources which were originally introduced in [1] since firstly, their impact on task placement decisions appears to be neglectable on modern asymmetric platforms and secondly, they require additional infrastructure and data (e.g new DT entries). The scheduler is also informed of the span of frequency domains, hence enabling an accurate accounting of the energy costs of frequency changes. This appears to be especially important for future Arm CPU topologies (DynamIQ) where the span of scheduling domains can be different from the span of frequency domains. 1.2 Overutilization/Tipping Point The primary job for the task scheduler is to deliver the highest possible throughput with minimal latency. With increasing utilization the opportunities to save energy for the scheduler become rarer. There must be spare CPU time available to place tasks based on utilization in an energy-aware fashion, i.e. to pack tasks on energy-efficient CPUs with unnecessary constraining of the task throughput. This spare CPU time decreases towards zero when the utilization of the system rises. To cope with this situation, we introduce the concept of overutilization in order to enable/disable EAS depending on system utilization. The point in which a system switches from being not overutilized to being overutilized or vice versa is called the tipping point. A per sched domain tipping point indicator implementation is introduced here. 1.3 Wakeup path On a system which has an energy model, the energy-aware wakeup path trumps affine and capacity based wake up in case the lowest sched domain of the task's previous CPU is not overutilized. The energy-aware algorithm tries to find a new target CPU among the CPUs of the highest non-overutilized domain which includes previous and current CPU, for which the placement of the task would contribute a minimum on energy consumption. The energy model is only enabled on CPUs with asymmetric core capacities (SD_ASYM_CPUCAPACITY). These systems typically have less than or equal 8 cores. 2. Tests Two fundamentally different tests were executed. Firstly the energy test case shows the impact on energy consumption this patch-set has using a synthetic set of tasks. Secondly the performance test case provides the conventional hackbench metric numbers. The tests run on two arm64 big.LITTLE platforms: Hikey960 (4xA73 + 4xA53) and Juno r0 (2xA57 + 4xA53). Base kernel is tip/sched/core (4.16-rc6), with some Hikey960 and Juno specific patches, the SD_ASYM_CPUCAPACITY flag set at DIE sched domain level for arm64 and schedutil as cpufreq governor [2]. 2.1 Energy test case 10 iterations of between 10 and 50 periodic rt-app tasks (16ms period, 5% duty-cycle) for 30 seconds with energy measurement. Unit is Joules. The goal is to save energy, so lower is better. 2.1.1 Hikey960 Energy is measured with an ACME Cape on an instrumented board. Numbers include consumption of big and little CPUs, LPDDR memory, GPU and most of the other small components on the board. They do not include consumption of the radio chip (turned-off anyway) and external connectors. +----------+-----------------+-------------------------+ | | Without patches | With patches | +----------+--------+--------+------------------+------+ | Tasks nb | Mean | RSD* | Mean | RSD* | +----------+--------+--------+------------------+------+ | 10 | 41.14 | 1.4% | 36.51 (-11.25%) | 1.6% | | 20 | 55.95 | 0.8% | 50.14 (-10.38%) | 1.9% | | 30 | 74.37 | 0.2% | 72.89 ( -1.99%) | 5.3% | | 40 | 94.12 | 0.7% | 87.78 ( -6.74%) | 4.5% | | 50 | 117.88 | 0.2% | 111.66 ( -5.28%) | 0.9% | +----------+--------+-------+-----------------+--------+ 2.1.2 Juno r0 Energy is measured with the onboard energy meter. Numbers include consumption of big and little CPUs. +----------+-----------------+-------------------------+ | | Without patches | With patches | +----------+--------+--------+------------------+------+ | Tasks nb | Mean | RSD* | Mean | RSD* | +----------+--------+--------+------------------+------+ | 10 | 11.25 | 3.1% | 7.07 (-37.16%) | 2.1% | | 20 | 19.18 | 1.1% | 12.75 (-33.52%) | 2.2% | | 30 | 28.81 | 1.9% | 21.29 (-26.10%) | 1.5% | | 40 | 36.83 | 1.2% | 30.72 (-16.59%) | 0.6% | | 50 | 46.41 | 0.6% | 46.02 ( -0.01%) | 0.5% | +----------+--------+--------+------------------+------+ 2.2 Performance test case 30 iterations of perf bench sched messaging --pipe --thread --group G --loop L with G=[1 2 4 8] and L=50000 (Hikey960)/16000 (Juno r0). 2.2.1 Hikey960 The impact of thermal capping was mitigated thanks to a heatsink, a fan, and a 10 sec delay between two successive executions. +----------------+-----------------+-------------------------+ | | Without patches | With patches | +--------+-------+---------+-------+-----------------+-------+ | Groups | Tasks | Mean | RSD* | Mean | RSD* | +--------+-------+---------+-------+-----------------+-------+ | 1 | 40 | 8.58 | 0.81% | 10.34 (+21.24%) | 4.35% | | 2 | 80 | 15.33 | 0.79% | 15.56 (+1.51%) | 1.04% | | 4 | 160 | 31.75 | 0.52% | 31.85 (+0.29%) | 0.54% | | 8 | 320 | 67.00 | 0.36% | 66.79 (-0.30%) | 0.43% | +--------+-------+---------+-------+-----------------+-------+ 2.2.2 Juno r0 +----------------+-----------------+-------------------------+ | | Without patches | With patches | +--------+-------+---------+-------+-----------------+-------+ | Groups | Tasks | Mean | RSD* | Mean | RSD* | +--------+-------+---------+-------+-----------------+-------+ | 1 | 40 | 8.44 | 0.12% | 8.39 (-0.01%) | 0.10% | | 2 | 80 | 14.65 | 0.11% | 14.73 ( 0.01%) | 0.12% | | 4 | 160 | 27.34 | 0.14% | 27.47 ( 0.00%) | 0.14% | | 8 | 320 | 53.88 | 0.25% | 54.34 ( 0.01%) | 0.30% | +--------+-------+---------+-------+-----------------+-------+ *RSD: Relative Standard Deviation (std dev / mean) 3. Dependencies This series depends on additional infrastructure being merged in the OPP core. As this infrastructure can also be useful for other clients, the related patches have been posted separately [3]. 4. Changes between versions Changes v1[4]->v2: - Reworked interface between fair.c and energy.[ch] (Remove #ifdef CONFIG_PM_OPP from energy.c) (Greg KH) - Fixed licence & header issue in energy.[ch] (Greg KH) - Reordered EAS path in select_task_rq_fair() (Joel) - Avoid prev_cpu if not allowed in select_task_rq_fair() (Morten/Joel) - Refactored compute_energy() (Patrick) - Account for RT/IRQ pressure in task_fits() (Patrick) - Use UTIL_EST and DL utilization during OPP estimation (Patrick/Juri) - Optimize selection of CPU candidates in the energy-aware wake-up path - Rebased on top of tip/sched/core (commit b720342849fe “sched/core: Update Preempt_notifier_key to modern API”) [1] https://lkml.org/lkml/2015/7/7/754 [2] http://www.linux-arm.org/git?p=linux-de.git;a=shortlog;h=refs/heads/upstream/eas_v2_base [3] https://marc.info/?l=linux-pm&m=151635516419249&w=2 [4] https://marc.info/?l=linux-pm&m=152153905805048&w=2 Dietmar Eggemann (1): sched/fair: Create util_fits_capacity() Quentin Perret (4): sched: Introduce energy models of CPUs sched/fair: Introduce an energy estimation helper function sched/fair: Select an energy-efficient CPU on task wake-up drivers: base: arch_topology.c: Enable EAS for arm/arm64 platforms Thara Gopinath (1): sched: Add over-utilization/tipping point indicator drivers/base/arch_topology.c | 2 + include/linux/sched/energy.h | 69 ++++++++++++ include/linux/sched/topology.h | 1 + kernel/sched/Makefile | 3 + kernel/sched/energy.c | 184 ++++++++++++++++++++++++++++++++ kernel/sched/fair.c | 234 +++++++++++++++++++++++++++++++++++++++-- kernel/sched/sched.h | 3 +- kernel/sched/topology.c | 12 +-- 8 files changed, 491 insertions(+), 17 deletions(-) create mode 100644 include/linux/sched/energy.h create mode 100644 kernel/sched/energy.c -- 2.11.0