Received: by 10.213.65.68 with SMTP id h4csp773202imn; Fri, 6 Apr 2018 08:39:49 -0700 (PDT) X-Google-Smtp-Source: AIpwx48WqN4Ke1+GqdegesZBk9N0NdS7PvtUO2oKqktl78iwLs4ZdAkYmSjMlnJq0Tuoaoh7Qi+Z X-Received: by 10.99.94.197 with SMTP id s188mr18074896pgb.21.1523029189415; Fri, 06 Apr 2018 08:39:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523029189; cv=none; d=google.com; s=arc-20160816; b=jyawwyzW3JReVsT87TTaQ13J4UyZ0b7s68VYPhYb0gGtOENT/Q7zdgntUVhunj4wwP 3Vowchm0iZgJ1b1jsvyAdrsXDC72/1FFjJumZohcAGGLwFtI3KspQB7hASNsDKDFrC4/ GQaRHfkKfL6XZKXeqjnF6mID8kjDbmr28D6XKgpMPEeYuvA7BzyIYnUVwPOyOEQZhQsZ qBPl93pIPLbPjf0WikKjJmKbwLJRAiQwtnmkEcPLHG4wNJkAWH92mjCdVsAEFA6hautg ISrHHW3GdKe4r891Cr79mPXpLGKVdYy9s5atELzIr42AOtpnw6oxUQkGj5J8cG2gxaaU ql7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=jcWaj7CSVrFQjL0ecdE0QGygs4LlTRE4zw8vRpD0Glc=; b=mq8av5yDFaFYtiTpXEyzn6PkAlMhyep4Ou4PwRKUHeKn3xjjJGfvxGpaRcuOrZv+tg 0AZ6/x0ZQKFJiWjfcn+OG+zbwAQJ3F0bt1nkCE5TU1IOavyVUhNt+cozsAlywFvVE+Xx 3ghnZG2ZUXP8oWDTTUGc50wAoJ0BYEWIwXWC5JxIyB+UOmMEpJQq/QmmacSXP3Qqzuv/ Ti8lpnZIf+57e7lJ+fBVODFpbh9lz+sddlUiU985aMdbXQVfbZfkZycKmhUaqEqmjrGX DvhRWkpkTSb3H1qGATuybnBfrgGoqgxZ10q8qInDxfyV7qujixgHxrKixUR8qAGBkk9G c7Og== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x3-v6si8528910plb.366.2018.04.06.08.39.35; Fri, 06 Apr 2018 08:39:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753053AbeDFPiL (ORCPT + 99 others); Fri, 6 Apr 2018 11:38:11 -0400 Received: from foss.arm.com ([217.140.101.70]:39084 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752586AbeDFPhD (ORCPT ); Fri, 6 Apr 2018 11:37:03 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3B533164F; Fri, 6 Apr 2018 08:37:03 -0700 (PDT) Received: from e107985-lin.cambridge.arm.com (e107985-lin.cambridge.arm.com [10.1.210.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4B9723F587; Fri, 6 Apr 2018 08:37:00 -0700 (PDT) From: Dietmar Eggemann To: linux-kernel@vger.kernel.org, Peter Zijlstra , Quentin Perret , Thara Gopinath Cc: linux-pm@vger.kernel.org, Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Joel Fernandes , Juri Lelli , Steve Muckle , Eduardo Valentin Subject: [RFC PATCH v2 2/6] sched: Introduce energy models of CPUs Date: Fri, 6 Apr 2018 16:36:03 +0100 Message-Id: <20180406153607.17815-3-dietmar.eggemann@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180406153607.17815-1-dietmar.eggemann@arm.com> References: <20180406153607.17815-1-dietmar.eggemann@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Quentin Perret The energy consumption of each CPU in the system is modeled with a list of values representing its dissipated power and compute capacity at each available Operating Performance Point (OPP). These values are derived from existing information in the kernel (currently used by the thermal subsystem) and don't require the introduction of new platform-specific tunables. The energy model is also provided with a simple representation of all frequency domains as cpumasks, hence enabling the scheduler to be aware of dependencies between CPUs. The data required to build the energy model is provided by the OPP library which enables an abstract view of the platform from the scheduler. The new data structures holding these models and the routines to populate them are stored in kernel/sched/energy.c. For the sake of simplicity, it is assumed in the energy model that all CPUs in a frequency domain share the same micro-architecture. As long as this assumption is correct, the energy models of different CPUs belonging to the same frequency domain are equal. Hence, this commit builds only one energy model per frequency domain, and links all relevant CPUs to it in order to save time and memory. If needed for future hardware platforms, relaxing this assumption should imply relatively simple modifications in the code but a significantly higher algorithmic complexity. As it appears that energy-aware scheduling really makes a difference on heterogeneous systems (e.g. big.LITTLE platforms), it is restricted to systems having: 1. SD_ASYM_CPUCAPACITY flag set 2. Dynamic Voltage and Frequency Scaling (DVFS) is enabled 3. Available power estimates for the OPPs of all possible CPUs Moreover, the scheduler is notified of the energy model availability using a static key in order to minimize the overhead on non-energy-aware systems. Cc: Ingo Molnar Cc: Peter Zijlstra Signed-off-by: Quentin Perret Signed-off-by: Dietmar Eggemann --- This patch depends on additional infrastructure being merged in the OPP core. As this infrastructure can also be useful for other clients, the related patches have been posted separately [1]. [1] https://marc.info/?l=linux-pm&m=151635516419249&w=2 --- include/linux/sched/energy.h | 49 ++++++++++++ kernel/sched/Makefile | 3 + kernel/sched/energy.c | 184 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 236 insertions(+) create mode 100644 include/linux/sched/energy.h create mode 100644 kernel/sched/energy.c diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h new file mode 100644 index 000000000000..941071eec013 --- /dev/null +++ b/include/linux/sched/energy.h @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _LINUX_SCHED_ENERGY_H +#define _LINUX_SCHED_ENERGY_H + +struct capacity_state { + unsigned long cap; /* compute capacity */ + unsigned long power; /* power consumption at this compute capacity */ +}; + +struct sched_energy_model { + int nr_cap_states; + struct capacity_state *cap_states; +}; + +struct freq_domain { + struct list_head next; + cpumask_t span; +}; + +#if defined(CONFIG_SMP) && defined(CONFIG_PM_OPP) +extern struct sched_energy_model ** __percpu energy_model; +extern struct static_key_false sched_energy_present; +extern struct list_head sched_freq_domains; + +static inline bool sched_energy_enabled(void) +{ + return static_branch_unlikely(&sched_energy_present); +} + +static inline struct cpumask *freq_domain_span(struct freq_domain *fd) +{ + return &fd->span; +} + +extern void init_sched_energy(void); + +#define for_each_freq_domain(fdom) \ + list_for_each_entry(fdom, &sched_freq_domains, next) + +#else +struct freq_domain; +static inline bool sched_energy_enabled(void) { return false; } +static inline struct cpumask +*freq_domain_span(struct freq_domain *fd) { return NULL; } +static inline void init_sched_energy(void) { } +#define for_each_freq_domain(fdom) for (; fdom; fdom = NULL) +#endif + +#endif /* _LINUX_SCHED_ENERGY_H */ diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index d9a02b318108..15fb3dfd7064 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -29,3 +29,6 @@ obj-$(CONFIG_CPU_FREQ) += cpufreq.o obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o obj-$(CONFIG_MEMBARRIER) += membarrier.o obj-$(CONFIG_CPU_ISOLATION) += isolation.o +ifeq ($(CONFIG_PM_OPP),y) + obj-$(CONFIG_SMP) += energy.o +endif diff --git a/kernel/sched/energy.c b/kernel/sched/energy.c new file mode 100644 index 000000000000..704bea6e1cad --- /dev/null +++ b/kernel/sched/energy.c @@ -0,0 +1,184 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Energy-aware scheduling models + * + * Copyright (C) 2018, Arm Ltd. + * Written by: Quentin Perret, Arm Ltd. + */ + +#define pr_fmt(fmt) "sched-energy: " fmt + +#include +#include +#include + +#include "sched.h" + +DEFINE_STATIC_KEY_FALSE(sched_energy_present); +struct sched_energy_model ** __percpu energy_model; + +/* + * A copy of the cpumasks representing the frequency domains is kept private + * to the scheduler. They are stacked in a dynamically allocated linked list + * as we don't know how many frequency domains the system has. + */ +LIST_HEAD(sched_freq_domains); + +static struct sched_energy_model *build_energy_model(int cpu) +{ + unsigned long cap_scale = arch_scale_cpu_capacity(NULL, cpu); + unsigned long cap, freq, power, max_freq = ULONG_MAX; + unsigned long opp_eff, prev_opp_eff = ULONG_MAX; + struct sched_energy_model *em = NULL; + struct device *cpu_dev; + struct dev_pm_opp *opp; + int opp_cnt, i; + + cpu_dev = get_cpu_device(cpu); + if (!cpu_dev) { + pr_err("CPU%d: Failed to get device\n", cpu); + return NULL; + } + + opp_cnt = dev_pm_opp_get_opp_count(cpu_dev); + if (opp_cnt <= 0) { + pr_err("CPU%d: Failed to get # of available OPPs.\n", cpu); + return NULL; + } + + opp = dev_pm_opp_find_freq_floor(cpu_dev, &max_freq); + if (IS_ERR(opp)) { + pr_err("CPU%d: Failed to get max frequency.\n", cpu); + return NULL; + } + + dev_pm_opp_put(opp); + if (!max_freq) { + pr_err("CPU%d: Found null max frequency.\n", cpu); + return NULL; + } + + em = kzalloc(sizeof(*em), GFP_KERNEL); + if (!em) + return NULL; + + em->cap_states = kcalloc(opp_cnt, sizeof(*em->cap_states), GFP_KERNEL); + if (!em->cap_states) + goto free_em; + + for (i = 0, freq = 0; i < opp_cnt; i++, freq++) { + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &freq); + if (IS_ERR(opp)) { + pr_err("CPU%d: Failed to get OPP %d.\n", cpu, i+1); + goto free_cs; + } + + power = dev_pm_opp_get_power(opp); + dev_pm_opp_put(opp); + if (!power || !freq) + goto free_cs; + + cap = freq * cap_scale / max_freq; + em->cap_states[i].power = power; + em->cap_states[i].cap = cap; + + /* + * The capacity/watts efficiency ratio should decrease as the + * frequency grows on sane platforms. If not, warn the user + * that some high OPPs are more power efficient than some + * of the lower ones. + */ + opp_eff = (cap << 20) / power; + if (opp_eff >= prev_opp_eff) + pr_warn("CPU%d: cap/pwr: OPP%d > OPP%d\n", cpu, i, i-1); + prev_opp_eff = opp_eff; + } + + em->nr_cap_states = opp_cnt; + return em; + +free_cs: + kfree(em->cap_states); +free_em: + kfree(em); + return NULL; +} + +static void free_energy_model(void) +{ + struct sched_energy_model *em; + struct freq_domain *tmp, *pos; + int cpu; + + list_for_each_entry_safe(pos, tmp, &sched_freq_domains, next) { + cpu = cpumask_first(&(pos->span)); + em = *per_cpu_ptr(energy_model, cpu); + if (em) { + kfree(em->cap_states); + kfree(em); + } + + list_del(&(pos->next)); + kfree(pos); + } + + free_percpu(energy_model); +} + +void init_sched_energy(void) +{ + struct freq_domain *fdom; + struct sched_energy_model *em; + struct sched_domain *sd; + struct device *cpu_dev; + int cpu, ret, fdom_cpu; + + /* Energy Aware Scheduling is used for asymmetric systems only. */ + rcu_read_lock(); + sd = lowest_flag_domain(smp_processor_id(), SD_ASYM_CPUCAPACITY); + rcu_read_unlock(); + if (!sd) + return; + + energy_model = alloc_percpu(struct sched_energy_model *); + if (!energy_model) + goto exit_fail; + + for_each_possible_cpu(cpu) { + if (*per_cpu_ptr(energy_model, cpu)) + continue; + + /* Keep a copy of the sharing_cpus mask */ + fdom = kzalloc(sizeof(struct freq_domain), GFP_KERNEL); + if (!fdom) + goto free_em; + + cpu_dev = get_cpu_device(cpu); + ret = dev_pm_opp_get_sharing_cpus(cpu_dev, &(fdom->span)); + if (ret) + goto free_em; + list_add(&(fdom->next), &sched_freq_domains); + + /* + * Build the energy model of one CPU, and link it to all CPUs + * in its frequency domain. This should be correct as long as + * they share the same micro-architecture. + */ + fdom_cpu = cpumask_first(&(fdom->span)); + em = build_energy_model(fdom_cpu); + if (!em) + goto free_em; + + for_each_cpu(fdom_cpu, &(fdom->span)) + *per_cpu_ptr(energy_model, fdom_cpu) = em; + } + + static_branch_enable(&sched_energy_present); + + pr_info("Energy Aware Scheduling started.\n"); + return; +free_em: + free_energy_model(); +exit_fail: + pr_err("Energy Aware Scheduling initialization failed.\n"); +} -- 2.11.0