Received: by 10.192.165.156 with SMTP id m28csp1105082imm; Wed, 18 Apr 2018 04:18:53 -0700 (PDT) X-Google-Smtp-Source: AIpwx48ysaMXeh84DgK5BpS2c6SyHB+yGpIZbdFWCsDarGy7uDQes+Ea2y7nI69OaRXtoNuJj4S8 X-Received: by 10.99.0.4 with SMTP id 4mr1404363pga.107.1524050333687; Wed, 18 Apr 2018 04:18:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524050333; cv=none; d=google.com; s=arc-20160816; b=y2jvCS+VAEFQT7YhQhQcPZ/nsblmsfkNnMhD8GX9k67F+4fbr51YxJ/UOS8WMf7fST OdsvSPoA8BjCsfop0axuIHQZeLxt9Vc9YPVFr0LlC5QQjQOcDsQrjE6Cm3gTRPJHh9P9 DKcptwtFVygHEI+8NPwQ6XMEitFA20Eng9oqxD6UebS6fYRlUPuPr2Jk1h/N89pjgr4p 9QjbtAiTbH7jQqdQLIJ9ZdQUCcw0sX+3BKbNK7bVGJf2IzM4o8U2Do74/1WdrgCDNIef LJyhTdcrZ1LjxPnZ5rWHTRh9bXHBB8cmBSUwF0i/knENaySiGWS8WfzjXkR+4GpUSxXX YTww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=s1fnOesbc3T0mExAykJmKNrFUq0Xfl/PFTrhkTyfR18=; b=UxFXa11rWuTLCs2tG6unFwerOCHxj/EzXFYvK9DcTxPcpPoTN0n+a+ek7atomdXcCd 87OIG4p2ofRwO/RuDGDmuQXKzhvMkJcQhL+R5b4wsy6cZtfS6Xx3xzlb0nsBatuCCIBR rRu3irDhixyKKXr9tEiy6JZ63u8SJqn700OhZVm39xIMQXkCzOYquJsb17E9Nk0nKCa2 3fhPl4padinqEtnjFYN/dGOqGc7ITKx+60LMoqgg8lFfMkh9QvdUkBhLbQJHmJzkNP9e bR2zzL5y6utE6GRljdmbM2Vob0Z9tA3D5QOuhgVaWEeUC9rjxi1cg97wwqkZkgMotRH0 Oiow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a5si943717pgd.628.2018.04.18.04.18.39; Wed, 18 Apr 2018 04:18:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753157AbeDRLRg (ORCPT + 99 others); Wed, 18 Apr 2018 07:17:36 -0400 Received: from foss.arm.com ([217.140.101.70]:53558 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752027AbeDRLRe (ORCPT ); Wed, 18 Apr 2018 07:17:34 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D93D81529; Wed, 18 Apr 2018 04:17:33 -0700 (PDT) Received: from e108498-lin.cambridge.arm.com (e108498-lin.cambridge.arm.com [10.1.210.84]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E81AA3F487; Wed, 18 Apr 2018 04:17:30 -0700 (PDT) Date: Wed, 18 Apr 2018 12:17:29 +0100 From: Quentin Perret To: Joel Fernandes Cc: Dietmar Eggemann , LKML , Peter Zijlstra , Thara Gopinath , Linux PM , Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Juri Lelli , Steve Muckle , Eduardo Valentin Subject: Re: [RFC PATCH v2 3/6] sched: Add over-utilization/tipping point indicator Message-ID: <20180418111729.GB6783@e108498-lin.cambridge.arm.com> References: <20180406153607.17815-1-dietmar.eggemann@arm.com> <20180406153607.17815-4-dietmar.eggemann@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Friday 13 Apr 2018 at 16:56:39 (-0700), Joel Fernandes wrote: > Hi, > > On Fri, Apr 6, 2018 at 8:36 AM, Dietmar Eggemann > wrote: > > From: Thara Gopinath > > > > Energy-aware scheduling should only operate when the system is not > > overutilized. There must be cpu time available to place tasks based on > > utilization in an energy-aware fashion, i.e. to pack tasks on > > energy-efficient cpus without harming the overall throughput. > > > > In case the system operates above this tipping point the tasks have to > > be placed based on task and cpu load in the classical way of spreading > > tasks across as many cpus as possible. > > > > The point in which a system switches from being not overutilized to > > being overutilized is called the tipping point. > > > > Such a tipping point indicator on a sched domain as the system > > boundary is introduced here. As soon as one cpu of a sched domain is > > overutilized the whole sched domain is declared overutilized as well. > > A cpu becomes overutilized when its utilization is higher that 80% > > (capacity_margin) of its capacity. > > > > The implementation takes advantage of the shared sched domain which is > > shared across all per-cpu views of a sched domain level. The new > > overutilized flag is placed in this shared sched domain. > > > > Load balancing is skipped in case the energy model is present and the > > sched domain is not overutilized because under this condition the > > predominantly load-per-capacity driven load-balancer should not > > interfere with the energy-aware wakeup placement based on utilization. > > > > In case the total utilization of a sched domain is greater than the > > total sched domain capacity the overutilized flag is set at the parent > > sched domain level to let other sched groups help getting rid of the > > overutilization of cpus. > > > > Signed-off-by: Thara Gopinath > > Signed-off-by: Dietmar Eggemann > > --- > > include/linux/sched/topology.h | 1 + > > kernel/sched/fair.c | 62 ++++++++++++++++++++++++++++++++++++++++-- > > kernel/sched/sched.h | 1 + > > kernel/sched/topology.c | 12 +++----- > > 4 files changed, 65 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h > > index 26347741ba50..dd001c232646 100644 > > --- a/include/linux/sched/topology.h > > +++ b/include/linux/sched/topology.h > > @@ -72,6 +72,7 @@ struct sched_domain_shared { > > atomic_t ref; > > atomic_t nr_busy_cpus; > > int has_idle_cores; > > + int overutilized; > > }; > > > > struct sched_domain { > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 0a76ad2ef022..6960e5ef3c14 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -5345,6 +5345,28 @@ static inline void hrtick_update(struct rq *rq) > > } > > #endif > > > > +#ifdef CONFIG_SMP > > +static inline int cpu_overutilized(int cpu); > > + > > +static inline int sd_overutilized(struct sched_domain *sd) > > +{ > > + return READ_ONCE(sd->shared->overutilized); > > +} > > + > > +static inline void update_overutilized_status(struct rq *rq) > > +{ > > + struct sched_domain *sd; > > + > > + rcu_read_lock(); > > + sd = rcu_dereference(rq->sd); > > + if (sd && !sd_overutilized(sd) && cpu_overutilized(rq->cpu)) > > + WRITE_ONCE(sd->shared->overutilized, 1); > > + rcu_read_unlock(); > > +} > > +#else > > +static inline void update_overutilized_status(struct rq *rq) {} > > +#endif /* CONFIG_SMP */ > > + > > /* > > * The enqueue_task method is called before nr_running is > > * increased. Here we update the fair scheduling stats and > > @@ -5394,8 +5416,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > > update_cfs_group(se); > > } > > > > - if (!se) > > + if (!se) { > > add_nr_running(rq, 1); > > + update_overutilized_status(rq); > > + } > > I'm wondering if it makes sense for considering scenarios whether > other classes cause CPUs in the domain to go above the tipping point. > Then in that case also, it makes sense to not to do EAS in that domain > because of the overutilization. > > I guess task_fits using cpu_util which is PELT only at the moment... > so may require some other method like aggregation of CFS PELT, with > RT-PELT and DL running bw or something. > So at the moment in cpu_overutilized() we comapre cpu_util() to capacity_of() which should include RT and IRQ pressure IIRC. But you're right, we might be able to do more here... Perhaps we could also use cpu_util_dl() which is available in sched.h now ?