Received: by 10.223.185.116 with SMTP id b49csp2162403wrg; Thu, 22 Feb 2018 09:04:35 -0800 (PST) X-Google-Smtp-Source: AH8x227qwyYnlmQeRBp10I/HqDWCuJGm2WiCScHCdYlVzUa3D9jt4QyTE/ZKZGrNUc+Q+367GFBU X-Received: by 10.99.116.67 with SMTP id e3mr6083412pgn.265.1519319075538; Thu, 22 Feb 2018 09:04:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519319075; cv=none; d=google.com; s=arc-20160816; b=06JyDE1q2jaLbaU4PdRe96tOaea61e9AQKvaA6ots6aQv4YTlIqcAbNLbCvME/UvLr 3ayH9sPRjKn+P0+YU29EafUdtrEmBzLLhHe20iB9MLvMWuMw6XerU51gcAN+XoAyGYdc XQN4oBjdDiwATHa4yJKHnqCEP9Jhd8PYEhJ3gj23v0qQ/0qqrc8UlBPR6ZpKMtaN2EMX dysmb5FUe4fNWeKWK/4ec+szh2qBZRKShHfbFaRIOiDtAa/Oy2r+xo0NpAgceMR19eCB QMpV/IGa2M/C+xZVjqsR8MQsh8hOIjkjqH+wNhFpnKOlWoffGG8hQ1rTVOUTvLYE7q3/ 2yjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=Mq3EKw8odvioOO9KRWcfUDPXALhkmWqxYtSiYVU/O/4=; b=sugsejVFKkYFnEK+SC2/U3+6OVWe+AtmgH7oFngZ3N41QWsr511V31DDpyenaJ8gHh uiZpEHGMs7pFQ7tLpE3ShjqjfTE5NVdyZeYYAyVGXUQlyE6RwguMm7vR1jFogrYyLeUA MWvL1kPyj2kKbU9X6p9lLipLA152ADlk/Kr3TTha4N3j+JovIg3b8upftur6nvTlf7b2 S4nv1oRI9PGWBgISjJdHiwy27KvTuSswnW963X4IE7XVEppSkPiUxfi1/dOMAlZXibA+ qKi0PcAhjxN8kVCCFY4NbwHFelRJHD93zQOLVASGbra/QqCqgio0AEm10L8sKrFsN3mx XgNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k23si292264pfi.222.2018.02.22.09.04.19; Thu, 22 Feb 2018 09:04:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933349AbeBVRCZ (ORCPT + 99 others); Thu, 22 Feb 2018 12:02:25 -0500 Received: from foss.arm.com ([217.140.101.70]:45254 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933317AbeBVRCW (ORCPT ); Thu, 22 Feb 2018 12:02:22 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7455615AB; Thu, 22 Feb 2018 09:02:22 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id EB5AA3F318; Thu, 22 Feb 2018 09:02:19 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle Subject: [PATCH v5 2/4] sched/fair: use util_est in LB and WU paths Date: Thu, 22 Feb 2018 17:01:51 +0000 Message-Id: <20180222170153.673-3-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180222170153.673-1-patrick.bellasi@arm.com> References: <20180222170153.673-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the scheduler looks at the CPU utilization, the current PELT value for a CPU is returned straight away. In certain scenarios this can have undesired side effects on task placement. For example, since the task utilization is decayed at wakeup time, when a long sleeping big task is enqueued it does not add immediately a significant contribution to the target CPU. As a result we generate a race condition where other tasks can be placed on the same CPU while it is still considered relatively empty. In order to reduce this kind of race conditions, this patch introduces the required support to integrate the usage of the CPU's estimated utilization in cpu_util_wake as well as in update_sg_lb_stats. The estimated utilization of a CPU is defined to be the maximum between its PELT's utilization and the sum of the estimated utilization of the tasks currently RUNNABLE on that CPU. This allows to properly represent the spare capacity of a CPU which, for example, has just got a big task running since a long sleep period. Signed-off-by: Patrick Bellasi Reviewed-by: Dietmar Eggemann Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Paul Turner Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes in v5: - always use int instead of long whenever possible (Peter) - add missing READ_ONCE barriers (Peter) Changes in v4: - rebased on today's tip/sched/core (commit 460e8c3340a2) - ensure cpu_util_wake() is cpu_capacity_orig()'s clamped (Pavan) Changes in v3: - rebased on today's tip/sched/core (commit 07881166a892) Changes in v2: - rebase on top of v4.15-rc2 - tested that overhauled PELT code does not affect the util_est --- kernel/sched/fair.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 76 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c8526687f107..8364771f7301 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6445,6 +6445,41 @@ static unsigned long cpu_util(int cpu) return (util >= capacity) ? capacity : util; } +/** + * cpu_util_est: estimated utilization for the specified CPU + * @cpu: the CPU to get the estimated utilization for + * + * The estimated utilization of a CPU is defined to be the maximum between its + * PELT's utilization and the sum of the estimated utilization of the tasks + * currently RUNNABLE on that CPU. + * + * This allows to properly represent the expected utilization of a CPU which + * has just got a big task running since a long sleep period. At the same time + * however it preserves the benefits of the "blocked utilization" in + * describing the potential for other tasks waking up on the same CPU. + * + * Return: the estimated utilization for the specified CPU + */ +static inline unsigned long cpu_util_est(int cpu) +{ + unsigned int util, util_est; + unsigned int capacity; + struct cfs_rq *cfs_rq; + + if (!sched_feat(UTIL_EST)) + return cpu_util(cpu); + + cfs_rq = &cpu_rq(cpu)->cfs; + util = READ_ONCE(cfs_rq->avg.util_avg); + util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); + util_est = max(util, util_est); + + capacity = capacity_orig_of(cpu); + util_est = min(util_est, capacity); + + return util_est; +} + static inline unsigned long task_util(struct task_struct *p) { return p->se.avg.util_avg; @@ -6469,16 +6504,52 @@ static inline unsigned long task_util_est(struct task_struct *p) */ static unsigned long cpu_util_wake(int cpu, struct task_struct *p) { - unsigned long util, capacity; + unsigned int util, util_est; + unsigned int capacity; /* Task has no contribution or is new */ if (cpu != task_cpu(p) || !p->se.avg.last_update_time) - return cpu_util(cpu); + return cpu_util_est(cpu); + + /* Discount task's blocked util from CPU's util */ + util = cpu_util(cpu); + util -= min_t(unsigned int, util, task_util(p)); + + if (!sched_feat(UTIL_EST)) + return util; + + /* + * Covered cases: + * - if *p is the only task sleeping on this CPU, then: + * cpu_util (== task_util) > util_est (== 0) + * and thus we return: + * cpu_util_wake = (cpu_util - task_util) = 0 + * + * - if other tasks are SLEEPING on the same CPU, which is just waking + * up, then: + * cpu_util >= task_util + * cpu_util > util_est (== 0) + * and thus we discount *p's blocked utilization to return: + * cpu_util_wake = (cpu_util - task_util) >= 0 + * + * - if other tasks are RUNNABLE on that CPU and + * util_est > cpu_util + * then we use util_est since it returns a more restrictive + * estimation of the spare capacity on that CPU, by just considering + * the expected utilization of tasks already runnable on that CPU. + */ + util_est = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued); + util = max(util, util_est); + /* + * Estimated utilization can exceed the CPU capacity, thus let's clamp + * to the maximum CPU capacity to ensure consistency with other + * cpu_util[_est] calls. + */ capacity = capacity_orig_of(cpu); - util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0); + util = min(util, capacity); - return (util >= capacity) ? capacity : util; + return util; } /* @@ -8005,7 +8076,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, load = source_load(i, load_idx); sgs->group_load += load; - sgs->group_util += cpu_util(i); + sgs->group_util += cpu_util_est(i); sgs->sum_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; -- 2.15.1