Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp5180526imm; Wed, 12 Sep 2018 02:14:27 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaqbHFafiYwP0hkR9aWN5EtOEmFz2l40uKvgL3Ubn1PhKg+Mlcbwtnra9GyWIrrsL0jGcE+ X-Received: by 2002:a63:4745:: with SMTP id w5-v6mr1133517pgk.140.1536743667704; Wed, 12 Sep 2018 02:14:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536743667; cv=none; d=google.com; s=arc-20160816; b=HL3IBPWWzt2y70E6Ofb1XWRBi6QNiCETfhcVNDeFaOO8iVyTR60ArYwpRrz5kEJUml QRaFzFbjJUvxjsa4a2N4FqwSWhmPsve+8mSDdve67Zeru+mUj8CMU61nhAucWwq+BMcL S8rDZrgi8xDAuGF2MHseA2vqNm3TQKJa08rCD2HpYWfeopdxtBmOeMWgesGqjllIanF+ dFa3oup5+kGg1bHLTIvPnT8x08No1MtlplAWKin83OFUS8MWZ5Uao5bJI+olO/K5aNv9 Uen64GY8Hb9IzGKXBbUTNB1JiTZAGuoHOjBqq2eg12rdNdJKB2eQ31RAQj1E/JiNbKDL 1rTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=jFes/PgeccX1iVVoclz0MAeTRFE9FWc9YfMx60FABpg=; b=tYHz3jTK7ujXcFJgdLtcmrjt/OFSaaTiLZOqiFiTgLzk1MO7j1LxDZPxSpmQVTRLES 7jdCDsG94p5oRyoStZR+z/Pg3eK1d/+nikB5zaqn8JmbNf+ZEDYI+tTcq9VQa/gS3UoK b97+BKoBJ02aVM5O/NUFQbPOMRq/+buDHyFveHNXEOfOlbRRdqMPbTa1zYhNIDcvjMOg KA5RpSpQLbk4DcJbQ8w9yN80WPj9e204NUdW+ElNbge+83KH76RroPQMu4UP+5EZboUW Km51prQD77o1SQO9RfLIwlRyxGD5gifiZBkU4JX8FiIPHbxHf6zQ4HwgGBquTkH0IbWq 83hA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r18-v6si503908pgj.194.2018.09.12.02.14.11; Wed, 12 Sep 2018 02:14:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728171AbeILORZ (ORCPT + 99 others); Wed, 12 Sep 2018 10:17:25 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:55894 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728068AbeILORY (ORCPT ); Wed, 12 Sep 2018 10:17:24 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 92E077A9; Wed, 12 Sep 2018 02:13:47 -0700 (PDT) Received: from queper01-lin.local (queper01-lin.emea.arm.com [10.4.13.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7B66E3F703; Wed, 12 Sep 2018 02:13:43 -0700 (PDT) From: Quentin Perret To: peterz@infradead.org, rjw@rjwysocki.net, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: gregkh@linuxfoundation.org, mingo@redhat.com, dietmar.eggemann@arm.com, morten.rasmussen@arm.com, chris.redpath@arm.com, patrick.bellasi@arm.com, valentin.schneider@arm.com, vincent.guittot@linaro.org, thara.gopinath@linaro.org, viresh.kumar@linaro.org, tkjos@google.com, joel@joelfernandes.org, smuckle@google.com, adharmap@codeaurora.org, skannan@codeaurora.org, pkondeti@codeaurora.org, juri.lelli@redhat.com, edubezval@gmail.com, srinivas.pandruvada@linux.intel.com, currojerez@riseup.net, javi.merino@kernel.org, quentin.perret@arm.com Subject: [PATCH v7 06/14] sched/topology: Reference the Energy Model of CPUs when available Date: Wed, 12 Sep 2018 10:13:01 +0100 Message-Id: <20180912091309.7551-7-quentin.perret@arm.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180912091309.7551-1-quentin.perret@arm.com> References: <20180912091309.7551-1-quentin.perret@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The existing scheduling domain hierarchy is defined to map to the cache topology of the system. However, Energy Aware Scheduling (EAS) requires more knowledge about the platform, and specifically needs to know about the span of Performance Domains (PD), which do not always align with caches. To address this issue, use the Energy Model (EM) of the system to extend the scheduler topology code with a representation of the PDs, alongside the scheduling domains. More specifically, a linked list of PDs is attached to each root domain. When multiple root domains are in use, each list contains only the PDs covering the CPUs of its root domain. If a PD spans over CPUs of two different root domains, it will be duplicated in both lists. The lists are fully maintained by the scheduler from partition_sched_domains() in order to cope with hotplug and cpuset changes. As for scheduling domains, the list are protected by RCU to ensure safe concurrent updates. The linked lists should be used only from code paths protected by the ENERGY_AWARE sched_feat. Cc: Ingo Molnar Cc: Peter Zijlstra Signed-off-by: Quentin Perret --- kernel/sched/sched.h | 21 +++++++ kernel/sched/topology.c | 134 ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 151 insertions(+), 4 deletions(-) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d59effb34786..9922615592a8 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -44,6 +44,7 @@ #include #include #include +#include #include #include #include @@ -700,6 +701,12 @@ static inline bool sched_asym_prefer(int a, int b) return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b); } +struct perf_domain { + struct em_perf_domain *obj; + struct perf_domain *next; + struct rcu_head rcu; +}; + /* * We add the notion of a root-domain which will be used to define per-domain * variables. Each exclusive cpuset essentially defines an island domain by @@ -752,6 +759,12 @@ struct root_domain { struct cpupri cpupri; unsigned long max_cpu_capacity; + + /* + * NULL-terminated list of performance domains intersecting with the + * CPUs of the rd. Protected by RCU. + */ + struct perf_domain *pd; }; extern struct root_domain def_root_domain; @@ -2242,3 +2255,11 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned return util; } #endif + +#ifdef CONFIG_SMP +#ifdef CONFIG_ENERGY_MODEL +#define perf_domain_span(pd) (to_cpumask(((pd)->obj->cpus))) +#else +#define perf_domain_span(pd) NULL +#endif +#endif diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index c4444ed77a55..8019a9bf281d 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -201,6 +201,116 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent) return 1; } +#ifdef CONFIG_ENERGY_MODEL +static void free_pd(struct perf_domain *pd) +{ + struct perf_domain *tmp; + + while (pd) { + tmp = pd->next; + kfree(pd); + pd = tmp; + } +} + +static struct perf_domain *find_pd(struct perf_domain *pd, int cpu) +{ + while (pd) { + if (cpumask_test_cpu(cpu, perf_domain_span(pd))) + return pd; + pd = pd->next; + } + + return NULL; +} + +static struct perf_domain *pd_init(int cpu) +{ + struct em_perf_domain *obj = em_cpu_get(cpu); + struct perf_domain *pd; + + if (!obj) { + if (sched_debug()) + pr_info("%s: no EM found for CPU%d\n", __func__, cpu); + return NULL; + } + + pd = kzalloc(sizeof(*pd), GFP_KERNEL); + if (!pd) + return NULL; + pd->obj = obj; + + return pd; +} + +static void perf_domain_debug(const struct cpumask *cpu_map, + struct perf_domain *pd) +{ + if (!sched_debug() || !pd) + return; + + printk(KERN_DEBUG "root_domain %*pbl: ", cpumask_pr_args(cpu_map)); + + while (pd) { + printk(KERN_CONT " pd%d:{ cpus=%*pbl nr_cstate=%d }", + cpumask_first(perf_domain_span(pd)), + cpumask_pr_args(perf_domain_span(pd)), + em_pd_nr_cap_states(pd->obj)); + pd = pd->next; + } + + printk(KERN_CONT "\n"); +} + +static void destroy_perf_domain_rcu(struct rcu_head *rp) +{ + struct perf_domain *pd; + + pd = container_of(rp, struct perf_domain, rcu); + free_pd(pd); +} + +static void build_perf_domains(const struct cpumask *cpu_map) +{ + struct perf_domain *pd = NULL, *tmp; + int cpu = cpumask_first(cpu_map); + struct root_domain *rd = cpu_rq(cpu)->rd; + int i; + + for_each_cpu(i, cpu_map) { + /* Skip already covered CPUs. */ + if (find_pd(pd, i)) + continue; + + /* Create the new pd and add it to the local list. */ + tmp = pd_init(i); + if (!tmp) + goto free; + tmp->next = pd; + pd = tmp; + } + + perf_domain_debug(cpu_map, pd); + + /* Attach the new list of performance domains to the root domain. */ + tmp = rd->pd; + rcu_assign_pointer(rd->pd, pd); + if (tmp) + call_rcu(&tmp->rcu, destroy_perf_domain_rcu); + + return; + +free: + free_pd(pd); + tmp = rd->pd; + rcu_assign_pointer(rd->pd, NULL); + if (tmp) + call_rcu(&tmp->rcu, destroy_perf_domain_rcu); +} +#else +static void free_pd(struct perf_domain *pd) { } +#endif /* CONFIG_ENERGY_MODEL */ + static void free_rootdomain(struct rcu_head *rcu) { struct root_domain *rd = container_of(rcu, struct root_domain, rcu); @@ -211,6 +321,7 @@ static void free_rootdomain(struct rcu_head *rcu) free_cpumask_var(rd->rto_mask); free_cpumask_var(rd->online); free_cpumask_var(rd->span); + free_pd(rd->pd); kfree(rd); } @@ -1964,8 +2075,8 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[], /* Destroy deleted domains: */ for (i = 0; i < ndoms_cur; i++) { for (j = 0; j < n && !new_topology; j++) { - if (cpumask_equal(doms_cur[i], doms_new[j]) - && dattrs_equal(dattr_cur, i, dattr_new, j)) + if (cpumask_equal(doms_cur[i], doms_new[j]) && + dattrs_equal(dattr_cur, i, dattr_new, j)) goto match1; } /* No match - a current sched domain not in new doms_new[] */ @@ -1985,8 +2096,8 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[], /* Build new domains: */ for (i = 0; i < ndoms_new; i++) { for (j = 0; j < n && !new_topology; j++) { - if (cpumask_equal(doms_new[i], doms_cur[j]) - && dattrs_equal(dattr_new, i, dattr_cur, j)) + if (cpumask_equal(doms_new[i], doms_cur[j]) && + dattrs_equal(dattr_new, i, dattr_cur, j)) goto match2; } /* No match - add a new doms_new */ @@ -1995,6 +2106,21 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[], ; } +#ifdef CONFIG_ENERGY_MODEL + /* Build perf. domains: */ + for (i = 0; i < ndoms_new; i++) { + for (j = 0; j < n; j++) { + if (cpumask_equal(doms_new[i], doms_cur[j]) && + cpu_rq(cpumask_first(doms_cur[j]))->rd->pd) + goto match3; + } + /* No match - add perf. domains for a new rd */ + build_perf_domains(doms_new[i]); +match3: + ; + } +#endif + /* Remember the new sched domains: */ if (doms_cur != &fallback_doms) free_sched_domains(doms_cur, ndoms_cur); -- 2.18.0