Received: by 10.192.165.148 with SMTP id m20csp417521imm; Fri, 20 Apr 2018 01:14:36 -0700 (PDT) X-Google-Smtp-Source: AIpwx49ii3HGHkn/JvFdXmjAaCKnjb9sbsQTORpfngcocuESCy4ckoQoAAbuFcXir2GrHZas6GT1 X-Received: by 2002:a17:902:7441:: with SMTP id e1-v6mr9191061plt.169.1524212076375; Fri, 20 Apr 2018 01:14:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524212076; cv=none; d=google.com; s=arc-20160816; b=o8CtQdcTnAdDKMgKB2j9X9ENn7UjzSC6LHq5JNIaU2bfUeypheGJOhBe4Pv7ccyxNJ lUJF3roit4AuHcd5du6l52rCBjvHq0QtUY8dSe1tSCPcrxwnheEJPEhVhpVy/Iw1ShKR n7VxYdXDN4wm21isnZ+5f/d3ed7Bx3+CrATRC2ajoX8qGaXoUjFffRfvXipldl0L+tS7 9DLYDc/CDonJmp4Bw7JI9v/jWMVveI25OcitRym3nCR5WIV4oT87b5YIS1VsVuyiTcJ2 d55jxSB6V0Ff63TRhdtOLOCsRnb5IsUEUWLiyVShjU877HBbM6tzOMVDhN69igEteM4z 4h4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=vPLy8iVB/OPyzmMzGfA4lufNaNT4dXvQx1Uoj5aTSgk=; b=Nia9gfXvuUPhGC4PwCAJ4IomJ7v6FDMUF3WWChugKzAbIVO+9tbAx9ns8jkB3q7CqW XBVyK21s1z5zScqyBFiZ/fhL6rJy2YaB1qJf9bg6HIzrhInwAkyTLSKHa3DpAM9c04Uh 4EGRyCRFTPEUzV/KgcQSvBLj7FgE5EqdSffL06hWmR4WaAL34Rn1V3vTmxQOh6zPB0ma PChet5eHS5jYXYNCYwpu0dFwhRaPNhkZRKfhZfYZcQyn9jrefatMfA8TUMjIqyzch2P1 6EuTCzlpv7OnLvT9B/UOhWwqtH5xuvwG7Rl3Z/9fdZA/oPdcGCoKAr4civhNpLzVa5Lp CCNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=U40TyM2G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1-v6si5079993plt.298.2018.04.20.01.14.22; Fri, 20 Apr 2018 01:14:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=U40TyM2G; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754422AbeDTINM (ORCPT + 99 others); Fri, 20 Apr 2018 04:13:12 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:35047 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754286AbeDTINJ (ORCPT ); Fri, 20 Apr 2018 04:13:09 -0400 Received: by mail-it0-f67.google.com with SMTP id 186-v6so1559418itu.0 for ; Fri, 20 Apr 2018 01:13:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=vPLy8iVB/OPyzmMzGfA4lufNaNT4dXvQx1Uoj5aTSgk=; b=U40TyM2Gy4AWaCk1oqfqq2B+ndn0MZUlq9rrFyl4wEMyaqE4+JYQtMWNtrfmm4yqOr vOtiwlAfRTpKNXlOt/wpsY4iRUvSqwDycRzU09tHtHtDazs7JBl4ubtrm4UmelIyolDZ i7t3N5TXb8ZBSoBnnO5FZyWE7NWt0HCfsc7r/hE0GZ8QGFp/SHT/HRRn2JxaYoXRU12m npuGnqKbtNDLuZGTy77xRNNsOae0YUFVoKUZ9pJyTd9WABM48ZAri6tPjHqor17TREgh b1496saUcseRYco91m/WHu22CoziHg8P0GqQ3aLBC0DY4snXUYZdL4eytI27HfwnXrxA QgNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=vPLy8iVB/OPyzmMzGfA4lufNaNT4dXvQx1Uoj5aTSgk=; b=I6UI1LRCts4MTZAHQZjrG5NyNpOkcv+o/fxN6AI0K4IskgbLyclaTeE3gf51oZx+Bn RIWD36s8+ynNEssUDyiwsj87ntz8HG2VAbYhQpydCVCuhKBgM30SGQd1y2IdbMp05MDH eVC+mj8OFsbVrncyd2K5flMVPZY5Y9wQOClbghQIXm3+ObafUZwArdt56L61U5xTbNMM UJQCVpJer/W2czy61sxoXU4Q4KkpX1YXEunTjllslo/WP8AQsTEARjJJ+6l69GfGQQsK JyT5VrrjqbnfNimGw2BtvJMKUQMlNZIfRV0o663ZGwSBT4hsASVEvgkKnyKCa7h8otS7 6iVg== X-Gm-Message-State: ALQs6tA35iYWjpQAB6SKJnAgbn/oA1qGjHBYJ1twvrqZBD2EM7ST2TDC 85CYXdRRLOaLlkzZ6huNaxEUS58IlUClquXRlHPuVA== X-Received: by 2002:a24:468e:: with SMTP id j136-v6mr2097881itb.151.1524211988292; Fri, 20 Apr 2018 01:13:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.181.213 with HTTP; Fri, 20 Apr 2018 01:13:07 -0700 (PDT) In-Reply-To: <20180418111729.GB6783@e108498-lin.cambridge.arm.com> References: <20180406153607.17815-1-dietmar.eggemann@arm.com> <20180406153607.17815-4-dietmar.eggemann@arm.com> <20180418111729.GB6783@e108498-lin.cambridge.arm.com> From: Joel Fernandes Date: Fri, 20 Apr 2018 01:13:07 -0700 Message-ID: Subject: Re: [RFC PATCH v2 3/6] sched: Add over-utilization/tipping point indicator To: Quentin Perret Cc: Dietmar Eggemann , LKML , Peter Zijlstra , Thara Gopinath , Linux PM , Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Juri Lelli , Steve Muckle , Eduardo Valentin Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 18, 2018 at 4:17 AM, Quentin Perret wrote: > On Friday 13 Apr 2018 at 16:56:39 (-0700), Joel Fernandes wrote: >> Hi, >> >> On Fri, Apr 6, 2018 at 8:36 AM, Dietmar Eggemann >> wrote: >> > From: Thara Gopinath >> > >> > Energy-aware scheduling should only operate when the system is not >> > overutilized. There must be cpu time available to place tasks based on >> > utilization in an energy-aware fashion, i.e. to pack tasks on >> > energy-efficient cpus without harming the overall throughput. >> > >> > In case the system operates above this tipping point the tasks have to >> > be placed based on task and cpu load in the classical way of spreading >> > tasks across as many cpus as possible. >> > >> > The point in which a system switches from being not overutilized to >> > being overutilized is called the tipping point. >> > >> > Such a tipping point indicator on a sched domain as the system >> > boundary is introduced here. As soon as one cpu of a sched domain is >> > overutilized the whole sched domain is declared overutilized as well. >> > A cpu becomes overutilized when its utilization is higher that 80% >> > (capacity_margin) of its capacity. >> > >> > The implementation takes advantage of the shared sched domain which is >> > shared across all per-cpu views of a sched domain level. The new >> > overutilized flag is placed in this shared sched domain. >> > >> > Load balancing is skipped in case the energy model is present and the >> > sched domain is not overutilized because under this condition the >> > predominantly load-per-capacity driven load-balancer should not >> > interfere with the energy-aware wakeup placement based on utilization. >> > >> > In case the total utilization of a sched domain is greater than the >> > total sched domain capacity the overutilized flag is set at the parent >> > sched domain level to let other sched groups help getting rid of the >> > overutilization of cpus. >> > >> > Signed-off-by: Thara Gopinath >> > Signed-off-by: Dietmar Eggemann >> > --- >> > include/linux/sched/topology.h | 1 + >> > kernel/sched/fair.c | 62 ++++++++++++++++++++++++++++++++++++++++-- >> > kernel/sched/sched.h | 1 + >> > kernel/sched/topology.c | 12 +++----- >> > 4 files changed, 65 insertions(+), 11 deletions(-) >> > >> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h >> > index 26347741ba50..dd001c232646 100644 >> > --- a/include/linux/sched/topology.h >> > +++ b/include/linux/sched/topology.h >> > @@ -72,6 +72,7 @@ struct sched_domain_shared { >> > atomic_t ref; >> > atomic_t nr_busy_cpus; >> > int has_idle_cores; >> > + int overutilized; >> > }; >> > >> > struct sched_domain { >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> > index 0a76ad2ef022..6960e5ef3c14 100644 >> > --- a/kernel/sched/fair.c >> > +++ b/kernel/sched/fair.c >> > @@ -5345,6 +5345,28 @@ static inline void hrtick_update(struct rq *rq) >> > } >> > #endif >> > >> > +#ifdef CONFIG_SMP >> > +static inline int cpu_overutilized(int cpu); >> > + >> > +static inline int sd_overutilized(struct sched_domain *sd) >> > +{ >> > + return READ_ONCE(sd->shared->overutilized); >> > +} >> > + >> > +static inline void update_overutilized_status(struct rq *rq) >> > +{ >> > + struct sched_domain *sd; >> > + >> > + rcu_read_lock(); >> > + sd = rcu_dereference(rq->sd); >> > + if (sd && !sd_overutilized(sd) && cpu_overutilized(rq->cpu)) >> > + WRITE_ONCE(sd->shared->overutilized, 1); >> > + rcu_read_unlock(); >> > +} >> > +#else >> > +static inline void update_overutilized_status(struct rq *rq) {} >> > +#endif /* CONFIG_SMP */ >> > + >> > /* >> > * The enqueue_task method is called before nr_running is >> > * increased. Here we update the fair scheduling stats and >> > @@ -5394,8 +5416,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) >> > update_cfs_group(se); >> > } >> > >> > - if (!se) >> > + if (!se) { >> > add_nr_running(rq, 1); >> > + update_overutilized_status(rq); >> > + } >> >> I'm wondering if it makes sense for considering scenarios whether >> other classes cause CPUs in the domain to go above the tipping point. >> Then in that case also, it makes sense to not to do EAS in that domain >> because of the overutilization. >> >> I guess task_fits using cpu_util which is PELT only at the moment... >> so may require some other method like aggregation of CFS PELT, with >> RT-PELT and DL running bw or something. >> > > So at the moment in cpu_overutilized() we comapre cpu_util() to > capacity_of() which should include RT and IRQ pressure IIRC. But > you're right, we might be able to do more here... Perhaps we > could also use cpu_util_dl() which is available in sched.h now ? Yes, should be Ok, and then when RT utilization stuff is available, then that can be included in the equation as well (probably for now you could use rt_avg). Another crazy idea is to check the contribution of higher classes in one-shot with (capacity_orig_of - capacity_of) although I think that method would be less instantaneous/accurate. thanks, - Joel