Received: by 10.213.65.68 with SMTP id h4csp306737imn; Tue, 20 Mar 2018 04:10:43 -0700 (PDT) X-Google-Smtp-Source: AG47ELsUSMNNeqkM0t4G6sgDRQmaH2fbQITAFbEuL9X0TDD36bT3X8mENmfdv4U1FnyT/+ZrX4Jm X-Received: by 10.101.81.75 with SMTP id g11mr5958365pgq.143.1521544243432; Tue, 20 Mar 2018 04:10:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521544243; cv=none; d=google.com; s=arc-20160816; b=bCN/OgAsX2luaK/1VCu8xZf4lLiC4CY0ZRztaTzsCKTPjv+i7VeeiqiO2nGT7CZSCe g+XyGA1v63LsJGtZQL9a62qU+2JDeuWOFjZmc+Jg0kCxd1q5PmudHk6zpS8fbg9vOPAB uI2Jyd1hW4fAKl+6Es8EOkSVBx2LIXWlRvqxTVtOi1UIQoYNVLkqS1f0gxCJWXC650c8 O4XqcfQ/ybc6BDixOC71CgT4JKWJ1QXH9M7r0Rtb82poCN54YFJMmwzvC+PtaaTMbR0Z 7r59Bq6A/rGkHEPHybKh1+ANbsGPYIoa7JxPuFO5C8wjPeGxivit4EIj7ce3RTUSsE7E sD0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:mime-version:robot-unsubscribe:robot-id :git-commit-id:subject:to:references:in-reply-to:reply-to:cc :message-id:from:date:arc-authentication-results; bh=vIpfoXjAM4SyH02ibwZN38RgW4LAUBb1nnXl8jQ4Qis=; b=wXekDhY+1CFSFFMBaCdh4Gyd/igntneilM8H39igJYb6HjmqGPgX0hYLIFEmRn6ZPA iWEdH406d3T5EILeG46UqYkBgsDbI8AuVotRSaSLIOSwaAFvpVFjmucM60szVMlBH8EJ UGeyDmtjR6V6Gg6Zr9/EMj2AWSkEy5d/8obu5jKbBfpao1a9RhhPz8IEjJrYiBpzucfs EhzZTKH8iZ1HOBJ8wtPJQXhVWaQrN4dzVtr6yPV8wQcCwsMuEYGUvo7wkTmV0WM9vsoU EBilh/NN3df4NYLcLFOpr2ZGUxEX4SuQxjYZx1Jthgw08eG+Tp3dGitafMgDRTV0CJCv 8pRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u7-v6si1402379plr.293.2018.03.20.04.10.29; Tue, 20 Mar 2018 04:10:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753040AbeCTLIt (ORCPT + 99 others); Tue, 20 Mar 2018 07:08:49 -0400 Received: from terminus.zytor.com ([198.137.202.136]:45939 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752136AbeCTLIn (ORCPT ); Tue, 20 Mar 2018 07:08:43 -0400 Received: from terminus.zytor.com (localhost [127.0.0.1]) by terminus.zytor.com (8.15.2/8.15.2) with ESMTP id w2KB83oZ012465; Tue, 20 Mar 2018 04:08:03 -0700 Received: (from tipbot@localhost) by terminus.zytor.com (8.15.2/8.15.2/Submit) id w2KB822j012462; Tue, 20 Mar 2018 04:08:02 -0700 Date: Tue, 20 Mar 2018 04:08:02 -0700 X-Authentication-Warning: terminus.zytor.com: tipbot set sender to tipbot@zytor.com using -f From: tip-bot for Patrick Bellasi Message-ID: Cc: peterz@infradead.org, hpa@zytor.com, dietmar.eggemann@arm.com, torvalds@linux-foundation.org, pjt@google.com, vincent.guittot@linaro.org, morten.rasmussen@arm.com, rafael.j.wysocki@intel.com, smuckle@google.com, tglx@linutronix.de, viresh.kumar@linaro.org, linux-kernel@vger.kernel.org, joelaf@google.com, juri.lelli@redhat.com, patrick.bellasi@arm.com, mingo@kernel.org, tkjos@android.com Reply-To: patrick.bellasi@arm.com, juri.lelli@redhat.com, tkjos@android.com, mingo@kernel.org, joelaf@google.com, linux-kernel@vger.kernel.org, tglx@linutronix.de, smuckle@google.com, viresh.kumar@linaro.org, rafael.j.wysocki@intel.com, vincent.guittot@linaro.org, pjt@google.com, torvalds@linux-foundation.org, morten.rasmussen@arm.com, hpa@zytor.com, peterz@infradead.org, dietmar.eggemann@arm.com In-Reply-To: <20180309095245.11071-3-patrick.bellasi@arm.com> References: <20180309095245.11071-3-patrick.bellasi@arm.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/core] sched/fair: Use util_est in LB and WU paths Git-Commit-ID: f9be3e5961c5554879a491961187472e923f5ee0 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on terminus.zytor.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: f9be3e5961c5554879a491961187472e923f5ee0 Gitweb: https://git.kernel.org/tip/f9be3e5961c5554879a491961187472e923f5ee0 Author: Patrick Bellasi AuthorDate: Fri, 9 Mar 2018 09:52:43 +0000 Committer: Ingo Molnar CommitDate: Tue, 20 Mar 2018 08:11:07 +0100 sched/fair: Use util_est in LB and WU paths When the scheduler looks at the CPU utilization, the current PELT value for a CPU is returned straight away. In certain scenarios this can have undesired side effects on task placement. For example, since the task utilization is decayed at wakeup time, when a long sleeping big task is enqueued it does not add immediately a significant contribution to the target CPU. As a result we generate a race condition where other tasks can be placed on the same CPU while it is still considered relatively empty. In order to reduce this kind of race conditions, this patch introduces the required support to integrate the usage of the CPU's estimated utilization in the wakeup path, via cpu_util_wake(), as well as in the load-balance path, via cpu_util() which is used by update_sg_lb_stats(). The estimated utilization of a CPU is defined to be the maximum between its PELT's utilization and the sum of the estimated utilization (at previous dequeue time) of all the tasks currently RUNNABLE on that CPU. This allows to properly represent the spare capacity of a CPU which, for example, has just got a big task running since a long sleep period. Signed-off-by: Patrick Bellasi Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dietmar Eggemann Cc: Joel Fernandes Cc: Juri Lelli Cc: Linus Torvalds Cc: Morten Rasmussen Cc: Paul Turner Cc: Peter Zijlstra Cc: Rafael J . Wysocki Cc: Steve Muckle Cc: Thomas Gleixner Cc: Todd Kjos Cc: Vincent Guittot Cc: Viresh Kumar Link: http://lkml.kernel.org/r/20180309095245.11071-3-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 84 ++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 70 insertions(+), 14 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 22b59a7facd2..570b8d056282 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6432,11 +6432,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) return target; } -/* - * cpu_util returns the amount of capacity of a CPU that is used by CFS - * tasks. The unit of the return value must be the one of capacity so we can - * compare the utilization with the capacity of the CPU that is available for - * CFS task (ie cpu_capacity). +/** + * Amount of capacity of a CPU that is (estimated to be) used by CFS tasks + * @cpu: the CPU to get the utilization of + * + * The unit of the return value must be the one of capacity so we can compare + * the utilization with the capacity of the CPU that is available for CFS task + * (ie cpu_capacity). * * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the * recent utilization of currently non-runnable tasks on a CPU. It represents @@ -6447,6 +6449,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) * current capacity (capacity_curr <= capacity_orig) of the CPU because it is * the running time on this CPU scaled by capacity_curr. * + * The estimated utilization of a CPU is defined to be the maximum between its + * cfs_rq.avg.util_avg and the sum of the estimated utilization of the tasks + * currently RUNNABLE on that CPU. + * This allows to properly represent the expected utilization of a CPU which + * has just got a big task running since a long sleep period. At the same time + * however it preserves the benefits of the "blocked utilization" in + * describing the potential for other tasks waking up on the same CPU. + * * Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even * higher than capacity_orig because of unfortunate rounding in * cfs.avg.util_avg or just after migrating tasks and new task wakeups until @@ -6457,13 +6467,21 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) * available capacity. We allow utilization to overshoot capacity_curr (but not * capacity_orig) as it useful for predicting the capacity required after task * migrations (scheduler-driven DVFS). + * + * Return: the (estimated) utilization for the specified CPU */ -static unsigned long cpu_util(int cpu) +static inline unsigned long cpu_util(int cpu) { - unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg; - unsigned long capacity = capacity_orig_of(cpu); + struct cfs_rq *cfs_rq; + unsigned int util; + + cfs_rq = &cpu_rq(cpu)->cfs; + util = READ_ONCE(cfs_rq->avg.util_avg); + + if (sched_feat(UTIL_EST)) + util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued)); - return (util >= capacity) ? capacity : util; + return min_t(unsigned long, util, capacity_orig_of(cpu)); } /* @@ -6472,16 +6490,54 @@ static unsigned long cpu_util(int cpu) */ static unsigned long cpu_util_wake(int cpu, struct task_struct *p) { - unsigned long util, capacity; + struct cfs_rq *cfs_rq; + unsigned int util; /* Task has no contribution or is new */ - if (cpu != task_cpu(p) || !p->se.avg.last_update_time) + if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time)) return cpu_util(cpu); - capacity = capacity_orig_of(cpu); - util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0); + cfs_rq = &cpu_rq(cpu)->cfs; + util = READ_ONCE(cfs_rq->avg.util_avg); + + /* Discount task's blocked util from CPU's util */ + util -= min_t(unsigned int, util, task_util(p)); - return (util >= capacity) ? capacity : util; + /* + * Covered cases: + * + * a) if *p is the only task sleeping on this CPU, then: + * cpu_util (== task_util) > util_est (== 0) + * and thus we return: + * cpu_util_wake = (cpu_util - task_util) = 0 + * + * b) if other tasks are SLEEPING on this CPU, which is now exiting + * IDLE, then: + * cpu_util >= task_util + * cpu_util > util_est (== 0) + * and thus we discount *p's blocked utilization to return: + * cpu_util_wake = (cpu_util - task_util) >= 0 + * + * c) if other tasks are RUNNABLE on that CPU and + * util_est > cpu_util + * then we use util_est since it returns a more restrictive + * estimation of the spare capacity on that CPU, by just + * considering the expected utilization of tasks already + * runnable on that CPU. + * + * Cases a) and b) are covered by the above code, while case c) is + * covered by the following code when estimated utilization is + * enabled. + */ + if (sched_feat(UTIL_EST)) + util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued)); + + /* + * Utilization (estimated) can exceed the CPU capacity, thus let's + * clamp to the maximum CPU capacity to ensure consistency with + * the cpu_util call. + */ + return min_t(unsigned long, util, capacity_orig_of(cpu)); } /*