Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp637192ybe; Thu, 19 Sep 2019 01:04:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqwJpQ08ofCH4qPtPI+GQo7gymsb6ZBqxvJv8d7wyrwyYB4oShkvWQ57+C5M0jNqblQaO1mF X-Received: by 2002:a17:906:4b07:: with SMTP id y7mr3695523eju.126.1568880282226; Thu, 19 Sep 2019 01:04:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568880282; cv=none; d=google.com; s=arc-20160816; b=F7zOWo1bJ4Z4cQpPS5W5mSG8vHYv5ZSlqTNgyxzEb3vJwgiIhaYTNmoTzq8RWuNsJ7 BagHBzUKFkxYt9GRrFLOZw10bUyU4KMjG4Yk9pdIjjjwR5pqrIyUvzj//Ek5/Z+EllkY KHR/3XkReqgfcz9Fr4eDHaa+qe/9dUt+q9abpPtnpZf0cmY1mfgeFilBae26vPA7GJE+ 7XXVcw48glf7dy5vYquw05kHbAOCoJtS2WBtq7DfuqLqZUzXAPBxfhtvPJzYkB5+fKQ1 HQ3AM4UouvPUbn5DBdaMWQQ94cKFK3XYE1kQBwqzUxfU8C/SKOQC6geV3EP96bNP5c4P T2EA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=sgTCP96FLp9YNilI982cLmg/GDlavq1GIDyPQXwddlQ=; b=Fo2+VAgLZKc0TspvX32ydcjbBROEXq4XCO9mCNhxZm90noGMtuEitdccBzIkdqmRyx IwkYwOsvsejDBOr2UPQVpkLC9MVkP0kPSuPcn4vY5n/77Mr6zeIlQH6ykYHVdlWOh0vd oiPGn8Dv7nWHCtzT4+qVe/intFZt7OmWhSCJcTd3dHQ9v1NuehhU8xR5dgL38twKBB27 FqazRjlnFKF/4DGx2JzeJ5QffCvApEAthSuSGgYRWK9gCeC5rBdTabeL/lMOdP7MxbLh f7D6+6U9EOqFSZmufHQ/zuiwxJBb1TuwGHhrNAUEZEYaZszTAEi/PHN0oW8SRIRMPVM/ 9p9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vtP961Mw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f29si4036405ejj.110.2019.09.19.01.04.18; Thu, 19 Sep 2019 01:04:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vtP961Mw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388298AbfISIA1 (ORCPT + 99 others); Thu, 19 Sep 2019 04:00:27 -0400 Received: from mail-lf1-f66.google.com ([209.85.167.66]:37581 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388269AbfISIA1 (ORCPT ); Thu, 19 Sep 2019 04:00:27 -0400 Received: by mail-lf1-f66.google.com with SMTP id w67so1645407lff.4 for ; Thu, 19 Sep 2019 01:00:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sgTCP96FLp9YNilI982cLmg/GDlavq1GIDyPQXwddlQ=; b=vtP961Mw0ZWI1S1jwgv6+Tv4nDQePMs5hU9AAF7cJVbP8YOReOgmk/9Es/V2RH34Te aJr8JEeKGbO/LnQsfKjN5RuLKyHMBx9jKEyPHNU3zqUG9TuuWsgG73+MIkRNmujow6qf czU+N8CPRWBDQ8TNc3z47w8DqNMKHX0IUqQY7S+KwgChCwFmVk80THAm/517E1YN656l yxdUvZoBR0Bip1ibjUlt4Gb7JH/6zTw2yjiOTk0dmprCBh81EoiZYZFFO0fdsMGwGWnz HKmtiiQoleJM672OqvNfYnAuWgZU6wzTivj3Bx2KJDDUscQCrIaY6MZ42Bg1VZvPO5Ow kvQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sgTCP96FLp9YNilI982cLmg/GDlavq1GIDyPQXwddlQ=; b=N6TH0bCR1n5SHjAF4NLkHFsQyTI5YhsKEs3a+eOkH4BoK5//aHSQjWEvWLjtx9ht1B LWZG4GFOc9q3oK2WwyKlTuxZUzrh62QnwMrGBFPDI/b3TRHFqupC5UwRjD0YKSLaP53k 2T2rnZsPlFAVLBSRiCqPdyzpVKvYk/Y+gvIyhWePDkt8ykYHnSieOGc5LdC27KwEluOi T/wnWSMXxuC0etvUzMIvynR66LnypEEcc05n7vAGiRBkd19knDkn59okdytrl6hMNNSo eibJXp3QNY7yo7yoqaomG4Ui2+N/G1OgHjW0R/3U/LKKXuKrKiv6OiELhuBmF6kN+1Ps vmdg== X-Gm-Message-State: APjAAAVEET/qjXoP+UecOYHJGa8EpirQejTvKl76HZiI5sF1E/8UPkTr w4F/wTLSoOOMMA4IsZI2wBbBx2jYnWnDDFRkW/PX3RHaAoo= X-Received: by 2002:ac2:5a07:: with SMTP id q7mr4058834lfn.177.1568880024796; Thu, 19 Sep 2019 01:00:24 -0700 (PDT) MIME-Version: 1.0 References: <1568877622-28073-1-git-send-email-yt.chang@mediatek.com> In-Reply-To: <1568877622-28073-1-git-send-email-yt.chang@mediatek.com> From: Vincent Guittot Date: Thu, 19 Sep 2019 10:00:13 +0200 Message-ID: Subject: Re: [PATCH 1/1] sched/eas: introduce system-wide overutil indicator To: YT Chang Cc: Peter Zijlstra , Matthias Brugger , wsd_upstream@mediatek.com, linux-kernel , LAK , linux-mediatek@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 19 Sep 2019 at 09:20, YT Chang wrote: > > When the system is overutilization, the load-balance crossing s/overutilization/overutilized/ > clusters will be triggered and scheduler will not use energy > aware scheduling to choose CPUs. > > The overutilization means the loading of ANY CPUs s/ANY/any/ > exceeds threshold (80%). > > However, only 1 heavy task or while-1 program will run on highest > capacity CPUs and it still result to trigger overutilization. So > the system will not use Energy Aware scheduling. > > To avoid it, a system-wide over-utilization indicator to trigger > load-balance cross clusters. The current rd->overutilized is already system wide. I mean that as soon as one CPU is overutilized, the whole system is considered as overutilized whereas you would like a finer grain level of overutilization. I remember a patch that was proposing a per sched_domain overutilization detection. The load_balance at one sched_domain level was enabled only if the child level was not able to handle the overutilization and the energy aware scheduling was still used in the other sched_domain > > The policy is: > The loading of "ALL CPUs in the highest capacity" > exceeds threshold(80%) or > The loading of "Any CPUs not in the highest capacity" > exceed threshold(80%) Do you have UCs or figures that show a benefit with this change ? > > Signed-off-by: YT Chang > --- > kernel/sched/fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 65 insertions(+), 11 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 036be95..f4c3d70 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5182,10 +5182,71 @@ static inline bool cpu_overutilized(int cpu) > static inline void update_overutilized_status(struct rq *rq) > { > if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) { > - WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED); > - trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED); > + if (capacity_orig_of(cpu_of(rq)) < rq->rd->max_cpu_capacity) { > + WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED); > + trace_sched_overutilized_tp(rq->rd, SG_OVERUTILIZED); > + } > } > } > + > +static > +void update_system_overutilized(struct sched_domain *sd, struct cpumask *cpus) > +{ > + unsigned long group_util; > + bool intra_overutil = false; > + unsigned long max_capacity; > + struct sched_group *group = sd->groups; > + struct root_domain *rd; > + int this_cpu; > + bool overutilized; > + int i; > + > + this_cpu = smp_processor_id(); > + rd = cpu_rq(this_cpu)->rd; > + overutilized = READ_ONCE(rd->overutilized); > + max_capacity = rd->max_cpu_capacity; > + > + do { > + group_util = 0; > + for_each_cpu_and(i, sched_group_span(group), cpus) { > + group_util += cpu_util(i); > + if (cpu_overutilized(i)) { > + if (capacity_orig_of(i) < max_capacity) { > + intra_overutil = true; > + break; > + } > + } > + } > + > + /* > + * A capacity base hint for over-utilization. > + * Not to trigger system overutiled if heavy tasks > + * in Big.cluster, so > + * add the free room(20%) of Big.cluster is impacted which means > + * system-wide over-utilization, > + * that considers whole cluster not single cpu > + */ > + if (group->group_weight > 1 && (group->sgc->capacity * 1024 < > + group_util * capacity_margin)) { > + intra_overutil = true; > + break; > + } > + > + group = group->next; > + > + } while (group != sd->groups && !intra_overutil); > + > + if (overutilized != intra_overutil) { > + if (intra_overutil == true) { > + WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED); > + trace_sched_overutilized_tp(rd, SG_OVERUTILIZED); > + } else { > + WRITE_ONCE(rd->overutilized, 0); > + trace_sched_overutilized_tp(rd, 0); > + } > + } > +} > + > #else > static inline void update_overutilized_status(struct rq *rq) { } > #endif > @@ -8242,15 +8303,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd > > /* update overload indicator if we are at root domain */ > WRITE_ONCE(rd->overload, sg_status & SG_OVERLOAD); > - > - /* Update over-utilization (tipping point, U >= 0) indicator */ > - WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED); > - trace_sched_overutilized_tp(rd, sg_status & SG_OVERUTILIZED); > - } else if (sg_status & SG_OVERUTILIZED) { > - struct root_domain *rd = env->dst_rq->rd; > - > - WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED); > - trace_sched_overutilized_tp(rd, SG_OVERUTILIZED); > } > } > > @@ -8476,6 +8528,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) > */ > update_sd_lb_stats(env, &sds); > > + update_system_overutilized(env->sd, env->cpus); This should be called only if (sched_energy_enabled()) > + > if (sched_energy_enabled()) { > struct root_domain *rd = env->dst_rq->rd; > > -- > 1.9.1 >