Received: by 10.192.165.148 with SMTP id m20csp418579imm; Fri, 20 Apr 2018 01:16:03 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/9tSYqnFbdOoRqFTn/I35WDw+ZqaEPun99llLXlunmF57zULui9SoEwkwcf5R7wtHOfkTS X-Received: by 10.101.102.143 with SMTP id b15mr7521464pgw.183.1524212163287; Fri, 20 Apr 2018 01:16:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524212163; cv=none; d=google.com; s=arc-20160816; b=FffNfdqXGfkv2CbwUw4z49/kAUux3M8pU0/RmcjhJ31qfDpL28kKfXpSbAcVzNzQFz ChGVNADClLaW93bxdBUaCZNZLyX7/pROMAycvadQbRTXf4fEyFiJf7E0p/weKMX3fRF6 G9yF7ylVUS9a1V9IBod3D+Ll6q5Nx9f4c17UU1VELaUmbScYao1EC5Rz40iVlgn9yOst AxZaTQd91UI6ttbsLdzLnUogAlQyk4RtU25A5jjwXLh10/lYpgtUifnuuTSMZg5zbfSi pmekY95FxB23xHnXDmAHs8itZIal0FAo1QejVzFMLB9/p8DaN0BVExnc4Vs4HQEVt7l9 1tTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=9Of0EAF+4M8KFAn7f88WkzcRRg2xifhxdDA8zSJ4qAU=; b=E/uJo/NmrRF4e1uwXhlebAO9rGF7U3aCLE9tSky8sm1eDacLIHKwriFUP97+s4bwZj WPBGP4+KcMtO5pUl6l1WhNEbcVMVNmwZSx7ZhlDA4keE9FltatmlOc9h5aCLk8kTc8B5 UYeTovaswCAfHFTsA+7A0soGdXcShSWwd8sLt9sdQy8iqcDhxK/J+FEysm3L1GwAD0yc /xNI85ZFpcTTFSZdhuuI/tZH8HqkMUd4zdLf+OnArcHNv4rtNDMiAxGmspwC82XkwFFU WgoUYNPaq6mBrVLwYOHMiSpCRX9zwWuc5nX0CQZJqJBB4cuWMNgyXIutF+bJmqNgkTyy HArQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Am7Ad77V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 65si4470163pgj.396.2018.04.20.01.15.48; Fri, 20 Apr 2018 01:16:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Am7Ad77V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754424AbeDTIOj (ORCPT + 99 others); Fri, 20 Apr 2018 04:14:39 -0400 Received: from mail-it0-f65.google.com ([209.85.214.65]:40431 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754164AbeDTIOg (ORCPT ); Fri, 20 Apr 2018 04:14:36 -0400 Received: by mail-it0-f65.google.com with SMTP id u62-v6so1534111ita.5 for ; Fri, 20 Apr 2018 01:14:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=9Of0EAF+4M8KFAn7f88WkzcRRg2xifhxdDA8zSJ4qAU=; b=Am7Ad77VXfnt/KkH3A4xPARX8pUBvj2upELizuoW3GCId+u+D2OkiuY0gd4rKBBPw+ bPXXnklqz999dXy0B3ewyrJYWFhduDjjJPaSi+6freQjQNYSv2tVcgq/dMum6i0tbfm6 xpfHqjgjXUctl5egQXR5qUnXnV1vaxgGkxde2GjpxVYNuaBSNpMcUSTrU4d5LDsBGu2j iz3Kn41QEV0JcoD8nnobBRNepFm1LONfZ1QtSU8yNs81bLmIpCheVXqSi5zgMBACEBFp pjckqlWc3E+uyRLlmB6bQQQ46SHlF0fj2Zj6P5JhqfwIocVeFrAOlreecyxCph/7Jwnm 1lRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=9Of0EAF+4M8KFAn7f88WkzcRRg2xifhxdDA8zSJ4qAU=; b=Piulcii0Y6niH7O4DrsKt3pvJdMeezpdNIwL25t7qH6PKoeyT+XR4q/llbmEP0GDYV fzdEGN/Qn8kFCCju6C8bJFV6Z9EaLyenKGB6x6BUvO5JoXpysXhJeUsh33cVFatDSWNW Q827iSb2ysFS7M5uoeAqrLd12k78V4LhXxelPEihIPtJ2iYbJExf5jOvVzy1w4KSA3pg s4qf8PaLC52Or2250epvRyuYrDmhp+bf+6eBGVutpp5yHD8keMgMLjJTpA524/tj9v4n tqEhT5dYrCpXpQ5P2J6AcTABauYU+BFzJ2kAunS9iLS1i/p1J064l773NKx2YM3DxQjp lFzQ== X-Gm-Message-State: ALQs6tDzlizkdlY8QdHdmUIP/oRxwe+Hh6kh7fUC7CiLp/3HAEzdooz7 6oHqVgNUV/UvSMJZAp8F+CC32HfkCOR5Oul91+9QhQ== X-Received: by 2002:a24:de44:: with SMTP id d65-v6mr2102011itg.41.1524212075801; Fri, 20 Apr 2018 01:14:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.181.213 with HTTP; Fri, 20 Apr 2018 01:14:35 -0700 (PDT) In-Reply-To: References: <20180406153607.17815-1-dietmar.eggemann@arm.com> <20180406153607.17815-4-dietmar.eggemann@arm.com> <20180418111729.GB6783@e108498-lin.cambridge.arm.com> From: Joel Fernandes Date: Fri, 20 Apr 2018 01:14:35 -0700 Message-ID: Subject: Re: [RFC PATCH v2 3/6] sched: Add over-utilization/tipping point indicator To: Quentin Perret Cc: Dietmar Eggemann , LKML , Peter Zijlstra , Thara Gopinath , Linux PM , Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Juri Lelli , Steve Muckle , Eduardo Valentin Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 20, 2018 at 1:13 AM, Joel Fernandes wrote: > On Wed, Apr 18, 2018 at 4:17 AM, Quentin Perret wrote: >> On Friday 13 Apr 2018 at 16:56:39 (-0700), Joel Fernandes wrote: >>> Hi, >>> >>> On Fri, Apr 6, 2018 at 8:36 AM, Dietmar Eggemann >>> wrote: >>> > From: Thara Gopinath >>> > >>> > Energy-aware scheduling should only operate when the system is not >>> > overutilized. There must be cpu time available to place tasks based on >>> > utilization in an energy-aware fashion, i.e. to pack tasks on >>> > energy-efficient cpus without harming the overall throughput. >>> > >>> > In case the system operates above this tipping point the tasks have to >>> > be placed based on task and cpu load in the classical way of spreading >>> > tasks across as many cpus as possible. >>> > >>> > The point in which a system switches from being not overutilized to >>> > being overutilized is called the tipping point. >>> > >>> > Such a tipping point indicator on a sched domain as the system >>> > boundary is introduced here. As soon as one cpu of a sched domain is >>> > overutilized the whole sched domain is declared overutilized as well. >>> > A cpu becomes overutilized when its utilization is higher that 80% >>> > (capacity_margin) of its capacity. >>> > >>> > The implementation takes advantage of the shared sched domain which is >>> > shared across all per-cpu views of a sched domain level. The new >>> > overutilized flag is placed in this shared sched domain. >>> > >>> > Load balancing is skipped in case the energy model is present and the >>> > sched domain is not overutilized because under this condition the >>> > predominantly load-per-capacity driven load-balancer should not >>> > interfere with the energy-aware wakeup placement based on utilization. >>> > >>> > In case the total utilization of a sched domain is greater than the >>> > total sched domain capacity the overutilized flag is set at the parent >>> > sched domain level to let other sched groups help getting rid of the >>> > overutilization of cpus. >>> > >>> > Signed-off-by: Thara Gopinath >>> > Signed-off-by: Dietmar Eggemann >>> > --- >>> > include/linux/sched/topology.h | 1 + >>> > kernel/sched/fair.c | 62 ++++++++++++++++++++++++++++++++++++++++-- >>> > kernel/sched/sched.h | 1 + >>> > kernel/sched/topology.c | 12 +++----- >>> > 4 files changed, 65 insertions(+), 11 deletions(-) >>> > >>> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h >>> > index 26347741ba50..dd001c232646 100644 >>> > --- a/include/linux/sched/topology.h >>> > +++ b/include/linux/sched/topology.h >>> > @@ -72,6 +72,7 @@ struct sched_domain_shared { >>> > atomic_t ref; >>> > atomic_t nr_busy_cpus; >>> > int has_idle_cores; >>> > + int overutilized; >>> > }; >>> > >>> > struct sched_domain { >>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> > index 0a76ad2ef022..6960e5ef3c14 100644 >>> > --- a/kernel/sched/fair.c >>> > +++ b/kernel/sched/fair.c >>> > @@ -5345,6 +5345,28 @@ static inline void hrtick_update(struct rq *rq) >>> > } >>> > #endif >>> > >>> > +#ifdef CONFIG_SMP >>> > +static inline int cpu_overutilized(int cpu); >>> > + >>> > +static inline int sd_overutilized(struct sched_domain *sd) >>> > +{ >>> > + return READ_ONCE(sd->shared->overutilized); >>> > +} >>> > + >>> > +static inline void update_overutilized_status(struct rq *rq) >>> > +{ >>> > + struct sched_domain *sd; >>> > + >>> > + rcu_read_lock(); >>> > + sd = rcu_dereference(rq->sd); >>> > + if (sd && !sd_overutilized(sd) && cpu_overutilized(rq->cpu)) >>> > + WRITE_ONCE(sd->shared->overutilized, 1); >>> > + rcu_read_unlock(); >>> > +} >>> > +#else >>> > +static inline void update_overutilized_status(struct rq *rq) {} >>> > +#endif /* CONFIG_SMP */ >>> > + >>> > /* >>> > * The enqueue_task method is called before nr_running is >>> > * increased. Here we update the fair scheduling stats and >>> > @@ -5394,8 +5416,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) >>> > update_cfs_group(se); >>> > } >>> > >>> > - if (!se) >>> > + if (!se) { >>> > add_nr_running(rq, 1); >>> > + update_overutilized_status(rq); >>> > + } >>> >>> I'm wondering if it makes sense for considering scenarios whether >>> other classes cause CPUs in the domain to go above the tipping point. >>> Then in that case also, it makes sense to not to do EAS in that domain >>> because of the overutilization. >>> >>> I guess task_fits using cpu_util which is PELT only at the moment... >>> so may require some other method like aggregation of CFS PELT, with >>> RT-PELT and DL running bw or something. >>> >> >> So at the moment in cpu_overutilized() we comapre cpu_util() to >> capacity_of() which should include RT and IRQ pressure IIRC. But >> you're right, we might be able to do more here... Perhaps we >> could also use cpu_util_dl() which is available in sched.h now ? > > Yes, should be Ok, and then when RT utilization stuff is available, > then that can be included in the equation as well (probably for now > you could use rt_avg). > > Another crazy idea is to check the contribution of higher classes in > one-shot with (capacity_orig_of - capacity_of) although I think that > method would be less instantaneous/accurate. Just to add to the last point, the capacity_of also factors in the IRQ contribution if I remember correctly, which is probably a good thing? - Joel